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PREFACE 


This  report  presents  the  essential  results  of  our  experimentation 
in  implementing  a  new  convolutional  decoding  algorithm  on  an  inexpen¬ 
sive  Zilog  Z-80  microcomputer  syst^b*>  The  work  reported  was  performed 
in  FT  79  as  part  of  the;  MITRE  MkUILpYoj ect  7010:  j  Low  Cost  Electronics. 

The  multiple  stack  algorithm  (MSA)  was  designed  by  Chevillat  and 
Costello  and  first  reported  in  1977  [1],  Their  work  is  the  basis 
from  which  we  began  our  studies.  The  experiment  gave  us  the  oppor¬ 
tunity  to  carefully  examine  and  evaluate  the  performance  of  the  Z-80 
in  a  complicated  simulation.  The  implications  of  the  microcomputer’s 
performance  apply  directly  to  a  major,  on-going  consideration  of  the 
Low  Cost  Electronics  project:  the  effective  utilization  of  inexpen¬ 
sive  LSI  microprocessors  in  signal  processing  tasks. 

The  report  will  deal  with  a  specialized  topic  and  certain  terms 
used  frequently  throughout  the  text  may  not  be  familiar  to  the  non- 
specialist.  Interested  readers  are  referred  to  the  reference  works 
of  Wozencraft  and  Jacobs  [19],  Gallager  [20],  Peterson  and  Weldon 
[21],  and  Viterbi  and  Omura  [22],  to  supplement  our  explanations  where 


necessary. 
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SECTION  I 


INTRODUCTION 


1.1  Purpose 

A  continuing  concern  of  the  Low  Cost  Electronics  project  is  the 
effective  use  of  commercially  available  large  scale  integrated  micro¬ 
processors  and  associated  memory  in  suitable  signal  processing  tasks. 
Because  of  our  interest  in  the  practical  application  of  error  correc¬ 
tion  coding,  and  the  software-intensive  aspects  of  sequential  decoding 
of  convolutional  codes,  we  selected  a  sequential  decoding  algorithm  as 
an  application  example.  Among  the  various  decoding  algorithms  available, 
we  have  focused  our  attention  on  a  recently  developed  sequential  de¬ 
coding  algorithm^ the  Multiple  Stack  Algorithm  (MSA)  [1], because  of 
its  expected  performance  and  since  its  decoding  operations  seemed 
quite  suitable  for  microprocessor  implementation. 

This  report  examines  in  some  detail  the  MSA  and  its  implementation 
on  a  low-cost,  general-purpose  small  computer,  typified  by  the  Zilog 
Z-80  microprocessor. 

The  MSA  is  an  improved  form  of  sequential  decoding  which  reduces 
erasure  probabilities  and  obtains  potentially  low  undetected  error 
probabilities  with  a  modest  decoding  effort.  MSA  decoding  complexity 
tends  to  be  independent  of  the  code  constraint  length  which  means  that, 
when  sufficient  memory  is  available,  high  performance  codes  with  larger 
encoder  constraint  lengths  can  be  used.  MSA,  which  achieves  nearly  as 
equal  an  error  rate  as  maximum-likelihood  decoding  at  faster  through¬ 
put  for  similar  levels  of  decoder  complexity,  may  be  one  of  the  most 
effective  alternatives  to  the  popular  Viterbi  decoding  algorithm  for 
convolutional  codes. 

1.2  Background 

The  Low-Gost  Electronics  work  in  digital  error  control  recognizes 
that  error-correction  coding,  used  in  conjunction  with  spread-spectrum 
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modulation,  provides  an  added  dimension  to  the  design  of  jam-resistant 
communications  systems.  Our  work  addresses  the  perplexing  dilemma 
of  error-correction  coding  performance  (or  coding  gain)  versus  cost 
and  complexity  of  implementation.  Our  work  concerns  the  effective 
use  and  implementation  of  both  convolutional  and  linear  algebraic 
block  codes.  This  report  addresses  our  work  in  convolutional  decoding. 

Convolutional  codes  are  frequently  described  as  tree  codes 
because  the  set  of  possible  code"  words  can  often  be  pictured  as  a 
tree-like  structure,  each  code  word  being  represented  by  a  distinct 
branch  emanating  from  the  root  node.  Probabilistic  decoding  algorithms 
have  been  devised  that  attempt  to  select  a  branch  in  the  tree 
minimally  distant  from  the  received  sequence  of  code  symbols  that 
has  been  corrupted  by  noise.  These  algorithms  separate  into  two 
classes:  maximum  likelihood  decoding  which  makes  an  optimal  decision 
at  each  branch  point  in  an  estimated  code  tree,  and  sequential 
decoding  which  attempts  to  select  an  optimum  path  through  the  tree  by 
successive  iteration  that  frequently  requires  back-search. 

The  maximum  likelihood  algorithms,  represented  most  typically 
by  the  well-known  Viterbi  algorithm,  require  all  the  nodes  in  the 
tree  to  be  examined.  Consequently,  they  are  limited  to  relatively 
short  codes  by  the  available  processor  storage  capacity  (and  access 
time) ,  this  storage  requirement  expanding  exponentially  with  code 
constraint  length. 

The  sequential  algorithms  search  the  code  tree  structure  for 
correct  codewords  on  a  trial-and-error  basis.  Incorrect  paths  are 
identified  and  rejected. 

Both  sequential  and  Viterbi  decoding  offer  practical  choices  to 
a  communication  engineer  designing  a  high  performance,  efficient. 
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communication  system,  but  the  usefulness  of  both  methods  has  been 
limited  by  the  decoding  complexity  and  speed.  For  Viterbi  decoding, 
both  the  computational  complexity  and  decoding  effort  are  proportional 
to  ev  (where  v  is  the  constraint  length  of  the  encoder);  therefore, 
it  is  practically  limited  to  short  codes,  usually  v  <  8.  For 
sequential  decoding,  both  the  complexity  and  decoding  effort  are 
nearly  independent  of  v,  but  the  number  of  computations  that  the 
decoder  must  perform  to  decode  the  received  sequence  is  a  random 
variable  of  Pareto  distribution  [2]  which  makes  the  decoding  in¬ 
complete  in  finite  time.  There  is  always  a  small  probability  that 
certain  error  patterns  will  never  be  decoded  or  that  erasures*  will 
occur  when  the  decoding  attempt  is  terminated. 

Under  Low  Cost  Electronics  we  are  experimenting  with  a  recently 
developed  algorithm  known  as  a  multiple  stack  algorithm  that  combines 
elements  of  both  maximum  likelihood  and  sequential  decoding.  Basically, 
this  algorithm  progresses  rapidly  through  the  code  tree  to  make  a 
coarse  sequential  decoding  estimate  and  then  examines  the  vicinity 
of  each  questionable  branch  point  to  refine  the  estimate.  The  problem 
of  stack  overflow  is  reduced  by  organizing  the  memory  in  multiple 
stacks,  appropriately  ordered  by  a  likelihood  function  metric,  in 
which  tentative  decisions  are  made  and  then  stored.  With  the  proper 
choice  of  codes,  the  multiple  stack  algorithm  should  provide  a 
superior  alternative  to  either  the  maximum  likelihood  or  strictly 
sequential  decoding  methods  used  alone.  The  new  algorithm  has  been 
Implemented  in  our  laboratory  with  an  8-bit  microprocessor  in  a 
facility  permitting  design  tradeoffs  and  direct  comparisons  to  be 
performed  easily. 


♦Erasure  -  Failure  of  the  decoder  to  reach  a  decision. 
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1.3  Scope 


To  Implement  the  MSA  on  a  microcomputer  and  test  its  speed  and 
error  performance  in  a  noisy  environment,  the  following  assumptions, 
based  on  practical  considerations,  have  been  made: 

(a)  The  Information  can  be  processed  independently  in 
blocks  if  the  block  lengths  (k)  are  at  least  4  or 
5  times  the  constraint  length  (v) . 

(b)  Erasure  probability  should  be  less  than  10  ^ . 

-4 

(c)  The  undetected  bit  error  rate  should  be  less  than  10 

for  algnal-to-noise  ratios  above  5.5  dB  (i.e.,  binary 

_2 

symmetric  channel  error  rates  less  than  3  x  10  ) . 

This  report  also  considers  the  practicality  of  real-time  MSA 
decoding  using  the  Zilog  Z-80  computer.  There  is  reason  to  believe 
that  if  a  high-speed  microprocessor  such  as  the  Z-80  or  one  of  its 
successors  is  used  and  if  an  adequate  buffer  is  available,  actual 
on-line  sequential  decoding  can  be  achieved  at  acceptable  data  and 
low  error  rates. 

Since  quantitative  results  are  difficult  to  obtain  analytically 
for  convolutional  decoding,  simulations  were  performed.  The  results 
presented  in  this  report  are  based  entirely  upon  these  simulations. 
Although  a  binary,  additive  white  Gaussian  noise  channel  was  used 
in  the  simulations,  many  of  the  results  are  applicable  to  other 
random  noise  channels. 

The  following  section  describes  the  MSA  and  the  software 
Implementation  of  the  MSA  on  the  Zilog  Z-80  microcomputer  system 
organized  for  this  purpose  on  Project  7010.  Section  III  discusses 
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the  selection  and  construction  of  fast-decodable  codes  which  achieve 
both  a  low  error  probability  and  a  minimal  erasure  probability. 
Section  IV  describes  the  parameter  selection,  and  Section  V  states 
some  important  properties  of  the  MSA.  The  performance  of  the  MSA 
and  comparison  with  the  performance  of  other  convolutional  decoding 
algorithms,  especially  that  of  the  Viterbi  algorithm,  are  analyzed 
and  discussed  in  Section  VI.  The  final  section  presents  conclusions 
and  suggestions  for  further  improvements  and  applications. 


IfflfpaMfl  PaCS  BUk*-MOT  Flli*D 


SECTION  II 

MULTIPLE  STACK  ALGORITHM 


2.1  Brief  Review  of  Convolutional  Codes 


Reliability,  simplicity  of  Implementation  and  nearly  optimal 
performance  make  convolutional  decoding  by  means  of  the  MSA  an 
appealing  application  example  for  effective  use  of  standard  micro¬ 
processors  as  signal  processors.  Figure  1  shows  a  very  simple 
convolutional  encoder.  It  consists  of  a  3-stage  shift  register  with 
information  bits  shifted  in  sequentially,  two  modulo-2  adders 
(exclusive  OR's)  and  a  multiplexer  for  the  two  resulting  bit  streams. 
In  the  terminology  of  convolutional  codes,  the  encoding  constraint 
length,  v,  is  equal  to  the  number  of  shift  register  stages  and  the 
code  rate  is  the  number  of  bits  kQ  shifted  into  the  register  divided 
by  the  number  of  multiplexed  output  bits nQ  sent  on  the  channel.  In 
the  example,  v  -  3  and  the  code  rate  (R£)  ■  1/2. 

As  previously  noted,  a  convolutional  code  is  conveniently 
illustrated  by  a  tree  diagram  (Figure  2) .  The  points  of  divergence 
are  called  nodes,  the  horizontal  transitions  are  called  branches. 

If  the  first  input  bit  is  a  zero,  the  code  symbols  are  those  shown 
on  the  first  upper  branch,  while  if  it  is  a  one,  the  output  code 
symbols  are  those  shown  on  the  first  lower  branch.  Similarly,  if 
the  second  input  bit  is  a  zero,  we  trace  the  diagram  upward.  In 
this  manner,  all  sixteen  possible  outputs  for  the  first  four 
inputs  may  be  traced. 

Convolutionally  encoded  messages  are  considered  reliable  because 
a  given  number  of  encoded  symbols  (n^v)  affects  each  information  bit 

*It  is  conventional  [4]  to  regard  the  blocks  in  Figure  1  as  shift 
registers  and  the  intervals  before  and  between  as  stages. 
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permitting  the  possibility  of  correcting  noisy  data.  For  each 
information  bit  incorrectly  transmitted,  there  will  be  nQv  symbols 
to  check  and  correct  it. 

These  tree  codes  can  be  practically  decoded  by  maximum  likelihood 
decoding  algorithms  only  when  the  constraint  length  is  short.  The 
Viterbl  algorithm,  for  Instance,  combines  nodes  representing  equivalent 
states  to  form  a  trellis  diagram  representation  of  the  code  tree, 
and  systematically  searches  the  trellis  for  maximum  likelihood 
estimates  of  transmitted  sequences.  Only  the  most  likely  branch  into 
each  node  at  a  given  trellis  depth  is  retained.  But  when  the  con¬ 
straint  length  of  the  code  becomes  very  large,  the  number  of  paths 
in  the  trellis  becomes  so  great  that  this  kind  of  decoding  method  1b 
impractical. 

The  Independent  nodes  create  paths,  the  number  of  which  increases 
exponentially  with  distance  as  we  progress  deeper  into  the  trellis. 

But,  since  many  different  paths  stem  from  the  early  nodes  in  the  tree, 
a  measurement  possibility  presents  itself.  If  we  could  use  a  means 
of  measuring  the  quality  of  the  paths  to  the  successive  nodes,  we 
might  then  be  able  to  effectively  disregard  paths  judged  to  be 
sufficiently  bad. 

Such  a  quality  measure  does  exist  in  the  form  of  the  Fano  metric, 
which  is  stated  below: 

Mf  (r)  -  log2  [-(l»rW) |-(l,rW)]  (1) 

P[Y  <l,rW>] 

Where  ^  and  X^  are  the  first  rW  digits  respectively 

of  the  received  sequence  Y  and  the  transmitted  sequence  X.  V  represents 
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the  number  of  code  bits  per  source  bit  (i.e.;  1/RC)  and  r  Indexes  the 
depth  in  the  tree.  B  is  a  bias  term  which  has  been  shown  by  J.  L.  Massey 
to  be  optimal  (in  the  sense  of  minimizing  decoded  error  probability) 
when  it  is  nearly  equal  to  the  code  rate  (Rc>  [3], 

On  the  average,  the  Fano  metric  Increases  monotonically  on  the 
correct  path  and  declines  in  pursuit  of  an  incorrect  path.  The 
metric  values  are  calculated  by  using  prior  knowledge  (or  assumption) 
of  the  channel  error  probability  and  the  Hamming  distance  between 
the  transmitted  and  received  sequences.  As  the  decoding  algorithm 
progresses  through  the  tree,  the  metric  value  must  be  stored  in 
association  with  the  corresponding  paths. 

2.2  The  Single  Stack  Algorithm  -  A  Sequential  Decoding  Method 

Sequential  decoding  is  a  generic  term  for  a  tree  code  probabilistic 
decoding  procedure  that  operates  by  making  tentative  hypotheses  on 
successive  branches  to  establish  a  path  through  the  tree  and  by 
changing  the  path  hypothesis  when  subsequent  choices  indicate  that 
earlier  choices  were  incorrect.  The  Fano  algorithm  represents  one 
of  the  early  methods  of  sequential  decoding  developed  for  a  class  of 
random  tree  codes  [4j.  This  algorithm  progresses  either  backward 
or  forward  through  the  code  tree  one  node  at  a  time  according  to  the 
path  metric  value  relative  to  a  preset  threshold. 

Although  the  correct  path  is  ultimately  found,  the  search  for 
that  path  is  restricted  to  a  route  through  connected  nodes  separated 
by  a  single  branch,  whether  the  decoder  is  advancing  or  retreating  in 
the  code  tree.  The  Fano  algorithm  method  tends  to  maintain  decoder 
hardware  complexity  and  storage  requirements  (excluding  buffer  storage) 
to  the  approximate  level  of  the  encoder.  This  is  achieved,  however, 
at  the  expense  of  many  sequential  computations  during  periods  of  high 
channel  noise. 
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The  single  stack  algorithm  (SSA) ,  developed  by  Ziganglrov  and 
later  by  Jellnek  [5],  is  an  Improved  sequential  decoding  method. 

The  SSA  reduces  the  complexity  of  path  search  by  providing  for  storage 
and  metric  reordering  of  all  previously  processed  path  data.  This 
allows  the  decoder  to  return  to  one  of  a  number  of  previously 
explored  nodes  according  to  the  relative  path  metric  values 
appropriately  ordered  and  stored  in  a  memory  stack.  This  improvement 
is  achieved  at  the  expense  of  increased  memory  requirements  and  the 
need  to  reorder  at  each  node  extension  the  path  metrics  stored. 

The  SSA  avoids  repeating  earlier  computations. 

The  principal  strengths  of  the  single  stack  algorithm  are: 

(1)  Bit  error  rates  which  decrease  exponentially 
with  code  constraint  length  at  information  rates 
below  channel  capacity, 

(2)  A  computation  load  that  remains  bounded,  independent 

of  constraint  length,  provided  the  code  rate  is 

less  than  a  computational  bound  (R  ) , 

comp 

(3)  Storage  requirements  which  grow  only  linearly 
with  code  constraint  length. 

The  SSA  also  has  weaknesses  which  could  make  its  results  unacceptable 
in  high  noise  situations.  One  such  limitation  is  the  variability 
in  computation  time  to  advance  one  node  in  the  tree.  To  keep  up 
during  noisy  periods  or  to  catch  up  quickly  when  the  channel  has 
again  become  quiet,  the  decoder  is  forced  by  this  variability  to 
include  sufficient  buffer  storage  and  to  possess  a  speed  advantage 
relative  to  the  channel  data  rate.  It  must  also  have  sufficient 
storage  capability  in  the  memory  stack. 
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We  know  that  sequential  decoding  can  be  described  quantitatively 
by  a  computational  effort  C.  We  interpret  C  to  be  the  number  of 
operations  or  computations  required  to  advance  a  unit  distance  in  the 
code  tree.  C  is  a  random  variable  having  a  Pareto  distribution*. 

In  other  words,  the  probability  P(C>N)  that  C  exceeds  some  large 
value  N  is  proportional  to  N  ^ 

P(C>N)  -  a  N~^  (2) 

where  the  Pareto  exponent  \p  is  independent  of  constraint  length, 
depending  only  on  channel  properties  and  code  rate  (a  is  a  constant  of 
proportionality)  [2] . 

Since  most  received  sequences  at  useful  channel  signal-to-noise 
ratios  are  decoded  with  very  little  effort,  average  computation  should 
be  lower  than  the  fixed  computational  effort  of  maximum  likelihood 
decoding.  But  there  is  always  some  fraction  of  received  sequences 
proportional  to  N  ^  which  imposes  an  impractically  large  computational 
load  and  results  in  incomplete  decoding,  which  we  interpret  here  as 
producing  erasures. 

Normally  a  computational  limit  (C^m)  is  established  beyond 
which  the  decoder  fails,  erasing  a  stream  of  data.  The  probability 
of  such  an  erasure  is  governed  by  the  Pareto  distribution: 

PERASURE  "  P(C>Clim)  ’  “Clim 


*Regardless  of  the  algorithm  used,  sequential  decoding  involves  a 
random  motion  in  the  tree,  and  hence  C  is  a  random  variable. 
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which  decreases  algebraically  as  a  function  of  C^m. 

These  erasures  are  generally  caused  by  an  overflow  of  the 
available  memory  in  the  input  buffer  which  stores  continuously 
arriving  channel  data,  or  by  overflow  of  the  finite  memory  stack 
which  stores  and  reorders  the  processed  data.  The  latter  occurrence 
is  more  prevalent.  Figure  3  presents  a  flow  chart  of  the  SSA. 

The  basic  steps  that  comprise  the  SSA  codes  are: 

(1)  Initialize  by  clearing  the  memory  table 
and  creating  one  entry  corresponding  to 
the  start  of  the  decoding  tree. 

(2)  Retrieve  the  entry  with  the  largest 
metric. 

(3)  Compute  the  branch  metrics  stemming 
from  the  node  found  in  step  (2)  and 
create  new  entries  in  place  of  the 
original.  Store  these  new  entries  in 
the  table. 

(4)  If  the  computational  limit  is  exceeded, 

the  procedure  halts  and  causes  erasures. 

If  the  C,  .  is  not  exceeded,  repeat 
11m 

steps  (1)  through  (3)  until  a  terminal 
node  is  reached  and  then  read  out  the 
decoded  path  from  the  origin  to  the 
terminal  node. 

When  the  computational  limit  is  exceeded  (or  the  stack  is  full  in  the 
SSA  sense),  this  algorithm  just  makes  a  random  guess  and  produces 
erasures.  This  event  is  governed  by  the  Pareto  distribution  as 
previously  discussed. 


EXTEND 

NODES 


CHECK 
TERMINAL, 
CHECK  Cllm, 
OUTPUT 
ERASURE 
OR  DECISION 


INITIALIZE 


ORDER 


Figm*  3.  FLOWCHART  OF  THE  SINGLE  STACK  ALGORITHM 
(WITH  FINITE  SIZE  MEMORY  STACK) 
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2.3  An  Improved  Stack  Algorithm  -  The  Multiple  Stack  Algorithm  (MSA) 


2.3.1  Basic  Strategy  of  the  MSA 

The  multiple  stack  algorithm  (MSA)  is  a  sequential  decoding 
procedure  devised  by  Chevillat  and  Costello  to  overcome  the  deficiencies 
of  the  single  stack  algorithm  [l].  The  MSA  strategy  differs  from 
that  of  the  SSA.  Where  the  SSA  advanced  slowly  during  noisy  periods 
exploring  many  incorrect  subsets  before  extending  the  correct  path, 
the  MSA  progresses  quickly  through  the  tree  to  find  a  reasonably 
good  tentative  estimate  after  which  the  alternatives  are  explored  in 
search  of  the  maximum-likelihood  path. 

The  problem  of  stack  overflow  is  accommodated  by  providing  a 
hierarchy  of  memory  stacks.  Each  stack  is  used  to  process  a  subset 
of  nodes  having  the  best  metric  values  transferred  to  them  from  the 
previous  stack.  As  these  distributed  stacks  are  filled,  the  data  is 
processed  in  the  final  stack  to  arrive  at  a  tentative  decision  for 
that  stack.  This  decision  is  stored  and  compared  with  a  new  tentative 
decision  reached  by  processing  in  the  previous  stack,  and  the  pro¬ 
cessing  is  continued  until  a  decision  is  reached  in  the  first  stack. 

The  best  tentative  decision  is  kept  at  each  iteration,  and,  when 
processing  is  concluded  in  the  initial  stack,  the  best  decision  be¬ 
comes  the  decoded  path.  Decoding  can  also  be  terminated  by  exceeding 
the  determined  in  this  case,  the  best  tentative  decision  yet 

obtained  becomes  the  decoding  decision.  The  MSA  produces  the  error 
performance  of  an  SSA  with  infinite  stack  size  by  iterating  the  de¬ 
coding  algorithm  in  a  set  of  subordinate,  finite-size  stacks. 

Although  it  is  Pareto  distributed,  the  probability  distribution 
of  computational  effort  can  be  upper-bound  by  an  exponentially  decreasing 
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function  of  the  code's  column  distance  function*.  This  fact  is 
useful  for  the  MSA  and  makes  it  best  suited  for  decoding  codes  with 
large  column  distance  function. 

The  MSA  retains  these  desirable  properties  of  sequential  decoding: 

1.  complexity  and  computational  effort  are  relatively 
independent  of  code  constraint  length,  and 

2.  average  decoding  is  faster  than  maximum  likelihood 
decoding. 

Furthermore,  the  MSA  allows  complete  erasurefree  decoding.  Such  a 
complete  decoding  method,  capable  of  achieving  low  error  probabilities 
with  substantially  lower  average  decoding  effort  than  the  Viterbi 
algorithm,  would  seem  to  be  a  desirable  compromise  between  direct 
maximum-likelihood  and  strict  sequential  decoding. 

2.3.2  Flowchart  Description  of  the  MSA 

The  multiple  stack  decoder  consists  of  a  central  processor  and 
a  number  of  finite-size  memory  stacks.  Decoding  begins  by  placing 
the  origin  node  of  the  estimated  code  tree  into  the  first  stack. 

The  flow  chart  in  Figure  4  shows  that  the  MSA  initially  operates 
exactly  like  the  conventional  single  stack  algorithm.  Starting  with 
the  origin  node,  the  top  node  in  the  stack  is  extended.  After  its 
elimination  from  the  stack,  the  successors  are  inserted  and  the 
stack  is  put  in  order  according  to  the  metrics  of  the  nodes.  Each 
entry  consists  of,  among  other  things,  a  node  identifier  and  its 
metric.  Decoding  proceeds  by  extending  the  node  on  the  top  of  the 
stack  again.  The  terminal  node  distance  is  established  by  setting  a 


★Column  distance  function  (CDF)  is  defined  as  the  minimum  Hamming 
distance  between  distinct  paths  divergent  from  the  first  branch  as 
a  function  of  distance  into  the  code  tree. 
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Fiflur*  4.  FLOWCHART  OF  THE  MULTIPLE  STACK  ALGORITHM 
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decoder  constraint  length.  It  is  normally  4  or  5  times  the  encoder 
constraint  length,  since  all  paths  merge  into  the  same  nodal  state  with 
high  probability  at  this  approximate  distance  [6] .  The  distance  of  the 
terminal  node  determines  the  size  of  the  processed  blocks  of  data.  The 
MSA  processes  each  block  as  a  unit. 

The  basic  steps  that  comprise  the  MSA  are: 

INITIALIZATION 

(1)  Initialize  by  clearing  the  memory  table  and  creating 
one  entry  corresponding  to  the  start  of  the  decoding 
tree. 

PATH  EXTENSION /ORDERING 

(2)  Retrieve  the  entry  with  the  largest  metric. 

(3)  Compute  the  new  branch  metrics  stemming  from  the 
node  found  in  step  (2)  and  store  these  entries  in 
place  of  the  original. 

STACK  CREATION 

(4)  If  the  first  stack  table  is  full,  transfer  the  top 

T  entries  into  the  next  secondary  stack  and  continue 
to  process  in  subsequent  stacks  as  necessary  until  a 
temporary  decision  is  reached  for  a  complete  path 
containing  k  information  bits,  where  k  is  set  by  the 
user. 

BACK  SEARCHING  (STACK  DELETION) 

(5)  Go  back  by  processing  in  reverse  order  in  the  set  of 
subordinate  stacks  to  improve  the  decision  by  finding 
a  complete  path  with  a  larger  metric  value  to  replace 
the  current  one  until  the  computational  limit  is 
reached . 
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TERMINATION 


a  decision  is  reached  at  the  first 
stack,  or 


(6)  Terminate  if  (i) 

(ii)  the  computational  limit  C, .  is 

lltD 

reached . 

Notice  the  differences  between  the  MSA  and  SSA.  Instead  of  quitting 
when  the  first  stack  has  filled  and  calling  the  entire  block  an 
erasure,  the  MSA  continues  processing  in  secondary  stacks  to  reach 
a  decision,  backsearches  to  refine  the  tentative  decision,  and  then 
terminates  decoding.  Termination  occurs  either  after  satisfactorily 
achieving  maximum  likelihood  decoding  or  by  reaching  a  set  computa¬ 
tional  limit. 

If  the  terminal  node  is  reached  before  the  first  stack  fills  up, 
decoding  is  completed,  and  the  path  from  the  origin  to  this  terminal 
node  becomes  the  decoded  codeword.  In  this  case,  the  MSA  executes 
exactly  the  same  decoding  steps  as  the  SSA.  If  the  first  stack 
fills  up  before  the  terminal  node  has  been  reached,  the  top  T  nodes 
of  the  first  stack  (the  most  likely  ones)  are  transferred  to  a  second 
stack  where  decoding  proceeds  using  these  T  transferred  nodes.  T  is 
a  decoding  parameter  to  be  selected. 

If  the  top  node  in  the  second  stack  reaches  the  terminal  node 
before  the  stack  fills  up,  the  codeword  corresponding  to  this  terminal 
node  is  stored  as  a  tentative  decision  in  a  special  register.  The 
decoder  then  deletes  the  remaining  nodes  in  the  second  stack  and 
returns  to  the  first  stack  where  decoding  continues.  Since  T  nodes 
have  been  removed  and  transferred  at  the  time  of  overflow,  exactly  T 
spaces  are  available  in  the  first  stack.  If  the  decoder  reaches  a 
terminal  node  before  the  first  stack  fills  up  again,  the  path  metrics 
of  the  new  terminal  node  are  compared  with  those  of  the  tentative 
decision.  The  node  with  the  best  path  metric  becomes  the  decoding 
decision. 


19 


But,  if  the  first  stack  fills  up  again  before  the  terminal  node 
is  reached,  a  new  second  stack  (the  previous  entries  in  the  second 
stack  having  been  deleted)  is  formed  by  again  transferring  the  top 
T  nodes  of  the  first  stack.  If  this  stack  also  fills  up  before  a 
tentative  decision  can  be  made,  a  third  stack  is  formed  by  transferring 
the  top  T  nodes  from  the  second  stack.  Additional  nodes  are  formed 
in  a  similar  manner  until  a  tentative  decision  is  reached.  The 
decoder  always  compares  a  new  terminal  node's  path  metric  to  that 
of  the  node  stored  in  the  tentative  decision  register  and  retains 
the  node  with  the  best  metric  available.  The  rest  of  the  nodes  in 
the  stack  are  then  deleted  and  decoding  proceeds  in  the  previous  stack. 

The  algorithm  terminates  decoding  when  it  reaches  the  end  of 
the  tree  in  the  first  stack  or  when  it  exceeds  the  Clim  set  by  the 
user.  In  the  former  case,  the  node  at  the  top  of  the  first  stack 
becomes  the  final  decoding  decision  (maximum-likelihood  decoding) . 

In  the  latter  case,  the  best  tentative  decision  yet  obtained  becomes 
the  final  decoding  decision  (non-maximum-likelihood  decoding) . 

We  would  like  to  take  a  moment  to  discuss  the  possible  occurrence 
of  undetected  (or  undetectable)  errors  In  the  MSA  decoder.  Undetected 
errors  will  occur  if  the  channel  error  patterns  are  such  that: 

(1)  the  correct  path  was  not  transferred  to  secondary 
stacks,  and  C^m  was  reached  before  returning  to  the 
first  stack  (e.g.,  the  correct  path  was  available 
throughout  the  decoding  process,  but,  because  C1 
was  reached,  an  incorrect  path  was  elicited),  or 

(2)  the  correct  path  was  transferred  from  the  first  stack, 
but  the  largest  metric  found  for  all  decoding 
decisions  at  secondary  stacks  was  equal  to  or  greater 
than  that  for  the  correct  path,  or 
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(3)  the  metric  value  of  the  correct  path  was  less  than 
the  value  of  some  incorrect  path  such  that  the 
incorrect  path  with  the  larger  metric  value  was 
finally  preferred  by  the  decoder,  and  decoding  was 
completed  in  the  first  stack. 

The  error  events  of  type  (3)  are  simply  those  that  would  remain 
undetected  by  an  ideal  maximum-likelihood  decoder.  The  error 
events  of  types  (1)  or  (2)  would  be  decoded  by  such  a  decoder.  The 
parameters  of  the  MSA  should  be  chosen  to  minimize  the  events  of 
types  (1)  or  (2)  at  the  required  throughput. 

2.3.3  Software  Implementation  of  the  MSA 

The  MSA  was  Implemented  on  a  Zilog  Z-80  microprocessor  in  our 
laboratory.  This  versatile,  third  generation  machine  was  utilized 
for  several  reasons  including  the  fast  instruction  cycle  time, 
operations T-code  efficiency,  multibyte  instruction  capability,  and 
suitability  for  the  stack  reordering  operation.  To  satisfy  the 
anticipated  memory  requirements  of  the  MSA,  a  RAM  containing  48k 
bytes  of  memory  was  incorporated  into  the  development  system.  Rate 
\  convolutional  codes  of  constraint  length  8  and  15  were  considered 
for  implementation  of  the  algorithm  ,  as  practical  and  challenging  examples. 

The  entire  simulation  can  be  separated  into  three  parts:  the 
random  error  generator,  the  convolutional  encoder  and  the  MSA 
decoder,  as  shown  in  Figure  5 . 

First,  a  16-bit  linear  feedback  shift  register  pseudo-random 
number  generator  is  used  to  provide  sequences  of  nearly  random 
source  bits  to  be  encoded  by  the  (2,l)v  convolutional  encoder.  The 
resulting  codewords  are  then  fed  to  a  simulated  binary  symmetric 
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'■  block  diagram  of  the  multiple  stack  algorithm  simulation 


channel  (BSC)  which  combines  the  output  of  a  linear  congruential 

* 

random  number  generator  with  the  bits  produced  by  the  source  generator. 
The  inputs  to  the  MSA  decoder  are  the  corrupted  channel  digits. 

Finally,  the  decoded  message  sequences  of  the  MSA  are  compared 
with  the  suitably  delayed  version  of  the  actual  transmitted  message 
sequences**.  The  number  of  errors  undetected  by  the  MSA  is  finally 
recorded  and  serves  as  a  measure  of  the  performance  of  the  MSA 
decoder.  Details  of  these  blocks  of  the  MSA  simulation  are  presented 
below. 

2 . 3 . 3 . 1  The  Random  Error  Generator.  The  random  error 
generator,  which  is  used  to  simulate  the  effects  of  additive  white 
Gaussian  noise  on  a  communication  channel,  consists  of  a  16-bit 
linear  feedback  shift  register  (LFSH) ,  pseudo-random  number  generator 
with  feedback  polynomial  210013g  and  a  16-bit  linear  congruential 
pseudo-random  number  generator  which  produces  a  sequence  of  residues 
of  a  large  modulus  by  means  of  a  linear  transformation  that  uses  the 
recursive  relationship: 

Y(i  +  1)  -  (22  +  1)  Y(i)  +  1  mod  216  (4) 

It  can  be  shown  that  both  congruential  and  shift-register  generators 
have  regularities  that  make  them  individually  unsuitable  for  general 
Monte  Carlo  use,  but  combining  the  generators  in  various  ways  appears 


to  produce  satisfactorily  random  sequences.  The  method  of  combination 


A  linear  congruential  generator  produces  a  sequence  of  residues 
of  a  large  modulus  m  by  means  of  a  linear  transformation  x.  ...-ax.+b 
mod  m.  0  <  x  <  m  [23].  11  1 

ft*  ^ 

Normally,  a  decoding  decision  is  withheld  until  the  entire  received 
sequence  has  been  processed. 
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we  used  is  known  as  algorithm  M.  It  combines  two  sequences  U  and  V 

of  uniform  distribution  U^,  .  V^,  V^,  V^,  .  .  •  having 

magnitudes  between  0  and  2^®  -  1,  produced  by  congruential  and  shift- 
register  generators.  Suppose  memory  locations  C(l),  C(2)  .... 
C(100)  are  filled  with  U.  One  generates  a  new  V,  uses  its  last 
seven  digits  to  form  an  index  J  from  0  to  127  and  uses  C(J)  with  a 
newly  generated  U. 

Sequences  obtained  by  applying  Algorithm  M  successfully  passed 
the  tests  usually  applied  to  uniform  random  number  generators,  and 
also  passed  the  chi-square  test  of  frequency  distribution. 

2. 3. 3. 2  The  Convolutional  Encoder.  Encode  is  our  algorithm 
used  to  generate  the  digits  associated  with  the  tree  branches  at  a 
rate  required  by  the  decoder.  There  are  at  least  two  approaches 
to  convolutional  encoding  which  differ  in  storage  requirements. 

A.  Table  -  Lookup 

For  a  simple  (2,1)8  code,  encoding  can  be  achieved  by  table 
lookup  methods  which  eliminate  complicated  repetitive  computations. 
Simple  simulations  of  the  encoder  operation  can  be  accomplished  by 
forming  a  table  with  256  possible  connections  of  8-stage  shift 
registers.  The  tables  permanently  store  the  information. 

The  basic  tradeoff  in  the  table-lookup  method  is  time  vs. 
memory.  The  required  size  of  the  codeword  table,  which  increases 
exponentially  with  the  constraint  length  of  the  code,  obviously 
sets  a  limit  to  the  type  of  encoder  simulation.  A  table-lookup 
method  is  practical  only  for  short  constraint  length  codes,  and  these 
have  limited  error  correction  capability. 


24 


B.  Software  Emulation 


For  longer  codes,  such  as  (2,1)15  codes,  it  is  better  to 
emulate  the  actual  encoder  with  a  series  of  bit  shifts,  AND,  and 
EXCLUSIVE-OR  operations  according  to  the  given  generator  polynomials, 
generating  codewords  as  required.  These  are  extremely  simple 
operations  for  the  arithmetic  logic  unit  of  the  Z-80's  CPU  and  can  be 
programmed  efficiently.  This  method  is  also  flexible  because  codes 
with  different  generator  polynomials  can  be  implemented  on  the 
microcomputer  by  a  simple  software  change  in  the  encoder  tap 
connections. 

2. 3. 3. 3  The  Multiple  Stack  Decoder.  The  multiple  stack  decoder 
block  is  the  main  program  of  the  microprocessor  simulation.  The 
Inputs  from  the  MSA  are  sequences  of  corrupted  received  channel 
symbols.  The  decoder  includes,  among  other  things,  a  replica  of  the 
convolutional  encoder.  This  encoder  constructs  a  decoding  tree  with 
which  the  received  channel  symbols  are  compared  to  obtain  the  best 
estimate.  The  outputs  of  the  MSA  are  the  decoded  message  bits.  The 
probability  of  an  undetected  error  decreases  exponentially  with  the 
size  of  the  replicated  encoder  built  into  the  decoder.  In  short, 
the  MSA  decoder  utilizes  the  information  of  both  the  received  symbols 
and  the  code  tree  replica  to  recover  the  actual  transmitted  bits. 

2.4  Some  General  Merits  of  the  MSA 

2.4.1  Completeness 

For  those  few  received  sequences  which  require  excessive 
searching  because  of  high  noise,  the  MSA  maintains  a  good  non-maximum 
likelihood  estimate  of  the  correct  codewords.  It  does  not  suffer 
from  the  deficiencies  related  to  Pareto  distribution  of  computational 
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effort  which  can  cause  incomplete  decoding  in  the  SSA  in  high  noise 
environments.  The  MSA  establishes  several  secondary  stacks  to 
process  the  decoding  and  to  reach  some  tentative  decision  before 
exceeding  the  computational  limit.  Therefore,  all  codewords  are 
decoded  to  some  estimate  within  the  computational  limit.  The 
completeness  of  MSA  decoding  is  sometimes  termed  erasure  free  decoding. 

2.4.2  Optimality 

The  MSA  is  a  sequential  decoding  method  which  achieves 
asymptotically  optimum  error  performance.  Because  the  complexity  of 
sequential  decoders  is  relatively  Independent  of  constraint  length  v, 
the  constraint  length  can  be  made  quite  large  (24  <  v  <  48)  with  a 
very  small  probability  of  undetected  error.  In  the  Z-80  implementa¬ 
tion,  the  constraint  length  is  restricted  to  a  smaller  number  because 
of  limitation  on  memory  and  processor  speed.  For  rates  above  Rcomp* 
the  error  performance  of  the  sequential  decoding  algorithm  has  been  shown 
elsewhere  to  be  superior  to  that  of  certain  block  codes  of  the  same  length 
[7],  The  MSA  also  bears  certain  similarities  to  that  of  the  Viterbi 
algorithm  particularly  the  ability  to  do  some  limited  multiple-path 
extension. 

2.4.3  Decoding  Efficiency 

The  decoding  effort  in  the  MSA  is  characterized  by  a  random 
variable.  It  allows  some  adaptability  to  the  channel  noise  environ¬ 
ment  and  is  faster  on  the  average  than  the  constant  decoding  rate 
of  Viterbi  decoding  which  has  a  fixed  computational  effort  regardless 
of  the  channel  environment.  The  MSA  operating  in  real-time  need 
never  be  idle  [3].  If  the  most  recent  received  signals  correspond 


•The  computational  cutoff  rate  of  sequential  decoding  above  which  the 
average  number  of  computations  per  information  bit  tends  to  infinity 
for  large  block  size. 
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to  depth  d  In  the  decoding  tree,  and  if  the  current  node  is  at 
depth  d,  then,  while  waiting  for  more  received  signals,  the  decoder 
can  extend  the  stack  entry  which  has  the  largest  metric  and  which 
is  at  a  depth  less  than  d.  This  advance  work  costs  nothing,  and 
since  it  may  be  required  at  a  later  time,  it  permits  efficient 
operation  at  high  data  rates. 

Such  a  complete  decoding  algorithm  capable  of  achieving  low 
error  probabilities  with  a  lower  average  decoding  effort  at  higher 
throughput  may  be  a  desirable  alternative  to  the  popular  Viterbi 
maximum- likelihood  decoder  in  various  applications.  Although  not 
explored  here,  it  also  lends  itself  to  soft  decision  decoding. 

It  could  provide  an  effective  inner  code  in  a  concatenated  coding 
scheme  [s]. 

2.5  Codes  Used  in  the  MSA 

It  has  been  shown  that  codes  with  certain  distance  properties 
will  provide  both  low  error  probabilities  and  fast  decoding  for  the 
MSA  [9].  These  properties  are  closely  related  to  the  column 
distance  functions  of  the  codes.  Guided  by  this  function,  searches 
for  good  codes  suitable  for  the  MSA  were  performed  and  will  be 
reported  in  the  next  Section. 
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SECTION  III 

CODE  SELECTION  FOR  THE  MSA 


3.1  Systematic  Versus  Nonsystematic  Codes 

An  (nQ,  kQ)  convolutional  code  is  called  systematic  if  a  fixed 
set  kQ  of  the  output  symbols  is  equal  to  the  current  kQ  input  symbols. 
Otherwise,  the  code  is  nonsystematic. 

In  the  absence  of  errors,  the  use  of  systematic  codes  provides 
immediately  apparent  information  sequences  from  the  encoded  sequences. 
For  many  engineering  purposes,  such  as  synchronization  and  monitoring, 
it  is  desirable  to  get  reasonably  good  estimates  of  the  information 
digits  directly  from  the  received  sequences  without  first  employing 
the  decoding  process.  In  nonsystematic  codes,  however,  the  informa¬ 
tion  symbols  are  diffused  by  linear  combination,  and  this  convenience 
is  unavailable. 

Wozencraft  and  Relffen  have  shown  that  for  any  nonsystematic 
code  there  is  a  systematic  code  with  the  same  minimum  distance  [loj . 
Their  results  showed  no  advantages  in  using  nonsystematic  codes  with 
threshold  decoding,  but  their  research  may  have  overlooked  advantages 
of  using  nonsystematic  codes  with  other  types  of  decoders  not  then 
available. 

Recent  results  have  demonstrated  reduced  undetected  decoding 
probability  by  use  of  nonsystematic  codes  with  either  sequential 
or  maximum  likelihood  decoders  [ll].  It  has  been  concluded  further 
that  the  nonsystematic  codes  have  simpler  encoder  realizations  (shorter 
encoding  constraint  length)  than  the  equivalent  systematic  codes. 

For  these  reasons,  nonsystematic  codes  were  chosen  over  systematic 
codes  for  our  MSA  sequential  decoder  implementation. 
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3.2  Best  Codes  for  the  MSA 


3.2.1  The  Free  Distance  Consideration  of  the  MSA 

For  many  applications,  the  code  rate  =  %  is  chosen  as  a 
compromise  between  bandwidth  expansion  and  correction  capability. 

The  resultant  doubling  of  bandwidth  attains  within  1  dB  the  total 
gain  possible  by  this  type  of  coding,  but  does  not  catastrophically 
degrade  the  energy  per  transmitted  source  bit.  A  new  coding  gain 
remains.  Coding  gain  analyses  previously  performed  on  this  subject 
have  also  shown  *s-rate  codes  to  be  optimally  efficient  on  additive 
Gaussian  noise  channels  [l2] . 

For  our  study,  the  rate  h  Bahl-Jelinek  (B-J)  nonsystematic  codes 
with  complementary  generator  were  chosen  to  obtain  the  largest  free 
distance  [l3].  The  free  distance  of  a  linear  convolutional  code  is 
the  minimum  distance  between  pairs  of  infinite  length  code  sequences; 
it  is  also  equal  to  the  minimum  Hamming  weight  of  the  sequence. 

Of  all  convolutional  codes  known  so  far,  B-J  codes  are  the  only  ones 
that  achieve  maximum  free  distance: 


d 


free 


v  +  2  for  v  <  16 
o  o 


(5) 


where  vq  is  the  encoding  constraint  length  of  the  B-J  codes. 

3.2.2  The  Column  Distance  Function  Consideration  for  the  MSA 


To  choose  the  best  available  codes  for  the  MSA,  the  relationship 

between  the  column  distance  function  and  the  computation  effort  of 

the  MSA  has  been  studied.  The  column  distance  function  (d  (r)>  is 

c 

the  minimum  distance  between  code  sequences  that  diverge  from  the 
first  branch  as  a  function  of  depth  r  in  the  tree.  The  column 
distance  function  is  bounded  at  the  constraint  length  distance  by  the 
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minimum  distance  (the  distance  between  sequences  at  a  depth  equal 
to  the  constraint  length)  and  by  the  free  distance 

dc  (v)  <  dc  (r)  <  d£  (r  -*■  ®  );  r  >  v  (6) 

Chevillat  and  Costello  have  shown  that  the  computational  effort 
required  for  sequential  decoding  of  a  convolutional  code  is  upper 
bounded  by  an  exponentially  decreasing  function  of  the  code's  column 
distance  function  [  S] .  Codes  with  rapidly  increasing  column  distance 
function  are  best  suited  for  the  MSA. 

3.2.3  Code  Selection  Procedure 


Table  I  and  Table  II  shown  below  have  been  generated  by  Chevillat 

to  enable  the  selection  of  good  codes  for  the  MSA  [l^].  We  have  used 

these  tables  for  our  code  selection  as  will  be  described.  Table  I 

displays  an  "influence  limit"  parameter  d  as  a  function  of  the 

max 

crossover  probability  for  a  binary-symmetric  channel.  This  parameter 

(d  )  is  defined  as  the  value  of  the  code's  column  distance  function 
max 

beyond  which  it  has  little  or  no  influence  on  the  Pareto  distribution 
of  computational  effort,  P(C>N).  For  a  given  expected  channel  cross¬ 
over  probability  (or  equivalent  signal-to-noise  ratio)  we  would  like 

to  choose  a  code  whose  CDF  reaches  at  least  d  in  order  to  control 

max 

P(C>CU»>- 

In  addition,  we  also  should  choose  a  code  having  large  free 
distance,  djree,  *n  order  to  minimize  the  residual  error  probability 
P£.  To  do  this  we  attempt  to  reach  the  condition  of  equation  (5) 
for  the  Bahl-Jelinek  nonsystematic  codes. 

For  the  MSA  we  also  need  to  select  codes  with  rapidly  increasing 
column  distance  function  to  control  the  computational  load.  Chevillat 
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has  presented  (in  Table  II  below)  the  generator  polynomials  for  codes 

having  optimum  average  column  distance  growth  as  a  function  of  dmax 

and  the  code's  free  distance,  d, 

free 

For  a  (2,1)8  code  we  choose  d^ reg  =  v  +  2  =  10.  For  a  typical 

channel  crossover  probability,  p  *  .03,  we  see  from  Table  I  that 

d  =10  also;  that  value  is  also  adequate  for  p  <  .03.  We  select 
max 

from  Table  II  a  code,  with  generator  polynomial  (1344) 0,  that  meets 

O 

the  distance  requirements. 

Similarly,  we  can  choose  a  code  having  longer  constraint  length 

for  use  on  noisier  channels.  For  a  channel  crossover  probability 

of  p  =  .048,  we  find  in  Table  I  that  d  =15.  For  a  constraint 

max 

length  v  =  15  code,  we  can  obtain  dfrge  =  v  +  2  =  17.  From  Table  II 
we  find  that  a  (2,1)15  code  having  generator  polynomial  (141512)g 
has  rapid  column  distance  growth  and  satisfies  the  distance 
requirements. 

We  can  construct  both  the  (2,1)8  and  (2,1)15  encoders  using  the 
generator  polynomials  selected  above.  As  stated  earlier,  these  codes 
are  Bahl-Jelinek  (B-J)  nonsystematic  codes  with  complementary 
generators.  The  selected  generator  polynomial  is  therefore 
complemented  to  yield  another  polynomial  in  such  a  way  that  their  first 
and  last  coefficients  are  "one”  and  the  middle  coefficients  are 
complementary.  The  resulting  two  polynomials  are  then  used  as  the 
tap  connections  of  the  two  shift  registers  which  generate  the  code 
by  generating  two  output  bits  for  each  input  source  bit.  For  a 
(2,1)8  code,  the  chosen  generator  polynomial  is  ( 1 344) Q .  The 

O 

complementary  generator  polynomial  is  then  (1434)g.  The  (2,1)8 
B-J  encoder  is  constructed  as  shown  in  Figure  6.  Similarly,  the 
generator  chosen  for  the  (2,1)15  B-J  code  is  (141512)  .  The 

O 
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TABLE  I 

Influence  Limit  of  CDF  on  P(C>N) 


33 


w 

1-J 

a 


a 


34 


<14) 


35 


TABLE  II  (Continued) 


complementary  generator  becomes  (136266)g.  The  (2,1)15  encoder  Is 
shown  in  Figure  7.  These  encoders  are  used  to  generate  our  (2,1)8 
and  (2,1)15  code  sequences  for  the  MSA  simulations. 
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(W1512)g  AND  (138266)g 


SECTION  IV 


PARAMETER  SELECTION  FOR  THE  MSA  IMPLEMENTED  ON 
A  Z-80  MICROCOMPUTER  SYSTEM 


Parameter  selection  for  MSA  use  with  the  Z-80  is  an  important 
consideration  in  the  attainment  of  low  error  rates  and  high  throughput. 
Unlike  most  convolutional  decoding  algorithms,  the  MSA  allows  the  user 
freedom  to  choose  a  strong  constraint  on  the  amount  of  memory  used  by 
adjusting  the  decoding  parameters  within  a  tolerable  range  of 
performance.  Another  user  might  consider  the  memory  cost  to  be  low 
enough  to  be  willing  to  use  as  much  memory  as  necessary  to  achieve 
high  performance  and  throughput.  This  section  points  out  various 
options  available  to  the  user  in  parameter  selection  and  also  considers 
those  parameters  that  dominate  MSA  performance.  It  also  discusses  the 
Impact  of  the  Z-80  microprocessor  implementation  on  parameter  selection 
in  terms  of  cost  and  effectiveness.  Finally,  an  efficient  way  of 
choosing  the  parameter  set  is  suggested. 

To  evaluate  MSA  performance  in  terms  of  its  residual  error 
probability  (Pg)  we  must  study  Pg  as  a  function  of  the  MSA  and  channel 
parameters.  In  general,  Pg  increases  as  the  channel  signal-to-noise 
ratio  (SNR)  decreases.  For  MSA  operation,  the  designer  should  adjust 
the  decoder  performance  to  accommodate  variations  in  channel  SNR  by 
choosing  the  available  number  of  stacks  in  order  to  keep  P(C>C^im) 
sufficiently  small  and  by  selecting  the  code  constraint  length  (within 
computational  limits)  to  minimize  the  residual  error  probability. 

While  larger  constraint  length  (v)  would  ultimately  improve  error 
performance  (exponentially)  with  only  algebraic  Increase  in  complexity 
of  the  MSA  decoder,  only  (2,1)8  and  (2,1)15  codes  were  implemented  and 
tested  on  our  microcomputer  development  system  because  of  storage 
limits. 
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To  study  the  influences  on  performance  of  the  channel  crossover 
probability,  memory  stack  size,  selected  computational  limit,  number 
of  examined  tree  branches,  and  number  of  transferred  nodes,  extensive 
decoding  trials  were  run  during  the  Z-80  simulation.  We  expected 
the  performance  of  the  MSA  to  be  sensitive  to  the  restrictions  imposed 
by  a  microcomputer  system.  Had  we  implemented  the  MSA  on  a  fast 
minicomputer  [14],  we  would  have  selected  certain  parameters  to 
optimize  the  error  performance  and  achieve  erasurefree  decoding  with 
little  regard  for  storage  and  computational  effort.  But  for  Z-80 
implementation,  the  performance  was  restricted  by  available  memory. 

The  freedom  to  select  parameters  was  similarly  restricted. 

We  have  available  the  following  parameters  in  the  MSA  implementation: 

p:  The  crossover  (transition)  probability  of  the 

binary  symmetric  channel  (determined  for  a  white 
gaussian  additive  channel) , 

Z^:  The  size  of  the  first  stack, 

k:  The  number  of  branches  through  the  tree  from  the 

root  to  terminal  node  (without  tailing) ,  it  is  also 
called  the  decoding  constraint  length, 

Z^:  The  size  of  the  secondary  stacks  i  =  2,  .  .  .h 

T:  The  number  of  nodes  transferred  from  the  previous 

stack  upon  stack  overflow  (the  number  of  transferred 
nodes  is  the  same  for  each  transfer), 

^lim!  The  comPutational  limit  beyond  which  the  algorithm 

must  terminate. 

These  six  parameters  and  their  effect  on  PE  will  now  be  individually 
examined . 
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4.1  The  Effect  of  Channel  Crossover  Probability  p  upon  P„ 

The  transition  diagram  for  the  binary  symmetric  channel  is  shown 
in  Figure  8.  Consider  the  binary  symmetric  channel  model  with  con¬ 
volutional  coding  and  MSA  decoding  shown  in  Figure  9.  The  encoded 
random  sequence  is  corrupted  by  the  additive  white  gaussian  noise. 
Antipodal  signalling  and  symmetric  two-level  received  quantization 
are  assumed.  The  effects  of  the  transmitter,  modulator,  the  gaussian 
channel  noise  and  the  receiver  quantizer  are  all  described  by  the  BSC 
transition  probability,  which  for  the  model  of  Figure  9  is 


P  -  Q 

where  Q(-)  is  a  normal  distribution  probability  function  in  which 
E. /N  is  the  energy-to-noise  ratio  per  source  bit  and  R  «  k  /n  is 

D  O  COO 

the  code  rate  in  bits  per  transmitted  symbol.  For  all  the  codes  we 

use  here  R  »  The  probability  function 
c 


Q(a) 


(8) 


is  a  well-tabulated  function;  the  corresponding  values  of 
for  different  values  of  p  are  shown  in  Table  III. 

For  time  invariant  channels  with  additive  white  gaussian  noise, 
the  error  performance  without  coding  is  upper-bounded  by  an  exponentially 
decreasing  function  of  signal-to-noise  ratio  [l5].  With  convolutional 
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Figure  9  BSC  MODEL  WITH  CONVOLUTIONAL  CODING  AND  MSA  DECODING  SCHEME 


TABLE  III 


The  Corresponding  E^/N^  for  Selected  Cross¬ 
over  Probabilities,  p. 


BSC 

"p" 

wm 

0.0449 

4.609 

0.0410 

4.812 

0.0371 

5.032 

0.0332 

5.272 

0.0293 

5.529 

0.0254 

5.823 

0.0215 

6.129 

0.0195 

6.298 

0.0176 

6.465 

0.0156 

6.669 
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encoding  and  multiple  stack  decoding,  the  plot  of  decoded  error  rate 

Pg  as  a  function  of  is  expected  to  decline  much  more  steeply 

than  the  channel  bit  error  rate  curve  of  the  BSC,  provided  E,/N  is 

b  o 

sufficiently  high. 

But  as  the  channel  noise  increases  and  E./N  decreases,  the 

b  o 

error  rate  obtained  by  use  of  the  MSA  increases  to  a  limit  at  which 
the  deteriorating  signal-to-noise  ratio  makes  further  coding  undesirable 
as  it  would  actually  produce  a  loss  rather  than  gain.  For  our  simula¬ 
tions,  the  selected  value  of  p  is  set  by  randomly  toggling  a  fixed 
average  number  of  error  bits  (according  to  a  gaussian  distribution) 
throughout  all  codeword  bits  being  processed. 

4.2  The  Effect  of  First  Stack  Size 

The  first  stack  size  should  be  large  enough  to  completely  process 
most  of  the  received  sequences  so  that  only  those  sequences  that  are 
badly  corrupted  require  the  use  of  secondary  stacks.  The  residual 
error  probability  of  the  MSA  is  upper-bounded  by  the  decoded  error 
probability  of  the  single  stack  algorithm  and  other  factors: 
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and  P  is  the  erasure-estimate  error  probability  (i.e.,  the  probability 
es 

of  incorrectly  estimating  a  bit  in  the  codewords  selected  when  C>C,  .  ). 

lim 

For  example,  the  simple  technique  that  picks  at  random  a  codeword 

whenever  C>Clim  has  the  probability  1  -  2-kssl  of  choosing  the 

wrong  path;  in  this  case  Peg  depends  on  whicn  path  is  chosen.  A 

coin-toss  estimate  of  the  bits  would  produce  P  =  %. 

es 


The  erasure-estimate  error  probability,  Peg>  Is  relatively  small 
for  the  MSA  because  the  entries  with  the  best  metrics  are  always 
included  in  the  transfer  of  T  entries  to  secondary  stacks.  Therefore, 
the  component  of  that  is  proportional  to  P^  should  decrease 

with  increasing  first  stack  size. 


If  either  P  tends  to  zero  (best 
es 


estimate)  or  P^  tends  to  zero  (infinite  first  stack  size),  then 


PE (MSA)  <  PE(SSA)»  (11) 

which  is  the  limiting  case  that  achieves  ideal  p?rformance. 

Since  is  influenced  strongly  by  Z^  (the  size  of  the  first 

stack) ,  in  principle  we  could  have  increased  this  parameter  indefinitely 
to  eliminate  the  second  term  in  equation  (9).  Since  this  is  impractical 
(it  would  degenerate  to  an  infinite-stack  SSA) ,  we  must  vary  the 
remaining  parameters  to  control  Pgg  and  the  computational  effort  to 
achieve  efficiently  a  desired  low  error  rate. 

The  first  stack  size  (Z^)  has  a  strong  influence  on  the  overall 
error  performance.  Z^  was  made  as  large  as  practical  in  our  simulation 
to  allow  a  larger  percentage  of  codewords  to  be  decoded  in  the  first 
stack  which  accomplishes  maximum-likelihood  (optimum)  decoding.  For 
those  more  noisy  blocks,  which  overflow  the  first  stack,  the  parameter 
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of  computational  limit  enters  into  the  picture.  With  a  large  Z^, 
there  are  only  a  small  portion  of  the  codewords  (actually  proportional 
to  that  need  secondary  stack  processing.  is  chosen  to 

terminate  multiple-stack  processing  after  some  reasonable  time. 

For  cases  like  these,  the  MSA  is  designed  to  have  at  least  one 
decision  using  multiple-stack  processing.  However,  the  performance 
can  seldom  compete  with  the  optimum  decoding  of  the  hypothetical  SSA 
with  infinite  stack  size.  Therefore,  our  first  goal  was  to  choose 
Z^  as  large  as  possible  under  practical  memory  constraints.  This 
choice  allowed  the  majority  of  codewords  to  have  maximum-likelihood 
estimates.  Later,  will  be  chosen  to  yield  a  reasonable  decoding 

time  limit  for  remaining  patterns  needing  additional  processing. 
Theoretically,  by  making  sufficiently  large,  decoding  can  be 

finished  eventually  after  returning  to  the  first  stack. 

Simulations  we  made  have  shown  that  by  setting  Z^  =  1024  for 
(2,1)8  codes  and  **  2900  for  (2,1)15  codes  the  first  stack  size  is 
large  enough  to  obtain  low  error  probability  but  without  occupying  an 
excessive  number  of  memory  locations  in  the  microprocessor  system. 


*k  is  also  termed  "decoding  delay"  in  the  sense  that  the  decoder 
does  not  make  any  decisions  until  all  k  information  bits  have  been 
decoded. 
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n,:  the  total  codeword  bits  (including  tail  bits)  decoded 
d 

for  each  frame. 

A  frame  contains  a  total  number  of  k  information  bits  so  that  for  a 
rate  %  code 


nd  =  2  •  (k  +  v  -  1) 


(13) 


where  v  -  1  is  the  number  of  tail  bits  added  to  the  sequence.  After 
substituting  equation  (13)  into  equation  (12) ,  we  obtain 


Rd  =  15 


(k  +  v  -  1) 


(14) 


If  k  is  made  much  larger  than  ^he  encoding  constraint  length,  the 
decoder  rate  remains  at  effectively  the  code  rate  of  The  value  of 
k  does  not  greatly  influence  decoder  memory  or  computational  require¬ 
ments,  but  as  k  is  increased  a  decoding  decision  is  deferred  for  a 
longer  time  and  more  bytes  are  required  for  each  entry.  More  buffer 
storage  and  computation  effort  are  required  for  larger  blocks  of  k 
information  bits.  There  is  a  connection  between  k  and  computational 
limit  for  fixed  available  memory.  As  k  becomes  larger, 

should  be  set  smaller  such  that  memory  requirements  are  kept  within 
the  range  of  availability.  Pg  was  shown  to  increase  as  k  increased  [16]. 
It  has  been  shown  that  for  a  Viterbi  decoder,  a  value  of  k  that  is  4 
or  5  times  the  encoding  constraint  length  v  is  sufficient  for  neg¬ 
ligible  degradation  from  optimum  decoder  performance  [17],  For  the 
Z-80  implementation,  we  have  bounded  k  256,  and,  in  general,  we 
choose  64  <  k  5  128  for  most  situations. 
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4.4  The  Effect  of  Size  of  Higher-Order  Stacks  Upon 

The  higher-order  stacks  act  as  a  supplementary  aid  for  those 
received  sequences  which  require  searching  in  excess  of  the  first  stack. 
The  use  of  large  secondary  stacks  allows  the  MSA  to  reach  the  first 
tentative  decision  with  fewer  stacks,  but  the  computational  effort  may 
be  large.  These  two  effects  offset  each  other.  P  is  independent 
of  the  choice  of  Z^  which  only  affects  decoding  time  and  computational 
effort.  Z^  is  normally  much  smaller  than  Z^.  For  both  (2,1)8  and 
(2,1)  15  codes,  we  have  chosen  Z^  =  11  for  all  i  #  1;  it  was  selected 
as  a  compromise  between  the  number  of  stacks  and  the  computational  effort. 
With  the  higher-order  stacks  substantially  smaller  than  the  first  stack, 
the  error  rate  performance  of  the  MSA  approximates  that  of  the  SSA 
without  erasures.  * 

4.5  The  Effect  of  T  Upon  P 

lit 

An  increase  in  the  number  of  nodes  transferred  from  the  first 
stack  requires  more  stacks  to  be  formed  and  larger  computational  effort 
to  reach  the  first  tentative  decision.  But  it  also  increases  the 
probability  that  the  correct  node  will  be  transferred  to  the  next 
stack.  These  two  effects  offset  each  other.  Again,  P  is  independent 

ill 

of  T,  but  T  influences  decoding  time  and  computational  complexity. 

In  order  to  limit  the  required  number  of  stacks  and  the  computational 
effort,  T  was  selected  as  T  <  4. 

4.6  The  Effect  of  Computational  Limit 

An  increase  in  computational  limit  allows  more  chance  of  decoding 

in  the  first  stack  rather  than  algorithm  termination  by  exceeding  a 

smaller  C,.  .  Any  termination  of  decoding  in  the  first  stack  implies 
llm 

an  estimate  at  least  as  good  as  that  of  the  infinite-stack  SSA  (it  is 
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also  a  maximum-likelihood  estimate).  Therefore,  P  decreases  with 
increasing  To  ensure  that  the  MSA  is  erasurefree,  the  decoder 

must  have  at  least  one  tentative  decision  before  decoding  stops 
by  reaching  For  T  =  1,  a  critical  value  of  computational 

effort  Ccr^t  is  defined,  below  which  a  tentative  decision  cannot  be 
obtained.  C-^im  must  be  larger  than  to  guarantee  erasurefree 

decoding: 


^lim  >  ^crit 


(15) 


where 

Ccrit  *  S  <Zi  -  1)  +  2  <v  -  1)  U« 

and  v  is  the  encoding  constraint  length. 

From  Equation  (16)  we  conclude  that  the  computational  limit 

and  stack  sizes  (Z^  and  Z^)  are  closely  interrelated.  If  for 

example  Z,  ■  1024  and  Z .  *  11,  C.  .  must  be  at  least  greater  than 
l  x  11m 

1657  for  the  (2,1)8  MSA  to  ensure  erasurefree  decoding. 

An  increase  of  C^m  beyond  Ccr^t  does  not  improve  PE  greatly. 

Therefore,  we  chose  also  to  limit  memory  requirements.  For 

Z-80  implementation,  the  saving  of  storage  is  a  major  concern  and 

should  be  kept  at  a  value  that  prevents  memory  overflow.  A 

safe  value  of  C, .  for  the  above  (2,1)8  MSA  case  is  1700.  For  (2,1)15 
11m 

MSA,  we  chose  C, .  «  3600. 

lim 

For  T  >  1,  we  expect  that  more  computations  will  be  required 
to  achieve  erasurefree  decoding  than  for  T  ■  1.  Care  was  taken  to 
select  to  be  large  enough  to  ensure  erasurefree  decoding. 
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4 . 7  Summai 


In  conclusion,  Z..  and  k  should  be  made  large  enough  to  satisfy 

_4 

the  error  probability  required  —  <  10  .  should  be  chosen 

to  guarantee  complete  decoding  without  unduly  stressing  memory 
requirements.  and  T  should  be  bounded  to  limit  the  required 
memory  and  the  computation  effort.  A  summary  of  our  parameter 
selection  for  the  (2,1)8  and  (2,1)15  codes  is  shown  in  Table  IV. 
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TABLE  IV 


Selected  Parameter  Values  for  (2,1)8  and  (2,1)15  MSA 


MSA 

Parameters 

(2,1)8  MSA 

(2,1)15  MSA 

Z1 

1024 

2900 

Zi 

11 

11 

k 

64 

64 

Clim 

1700 

3600 

T 

3 

3 
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SECTION  V 

SOME  PROPERTIES  OF  THE  MSA 


Five  properties  of  the  MSA  decoding  are  discussed  below.  Most 
of  these  properties  and  their  proofs  are  taken  from  the  work  of 
Chevillat  fl4].  We  include  them  here,  with  additional  discussion, 
for  the  sake  of  completeness  and  to  enable  the  reader  to  grasp  more 
fully  the  salient  features  of  the  algorithm. 

5.1  Properties  and  Their  Proofs 

Vnopenty  1:  I  f  the  MSA  teAminateA  decoding  by  reaching  the  end  o  f 
the  tn.ee  in  the  f* OiAt  Atack,  itA  final  decision  it  at  leaAt  at  good 
04  the  deciAion  of  a  tingle  Atack  algonithm  voith  an  infinitely  lange 
Atack.  That  final  decision  it  equivalent  to  a  maximum-likelihood 
decoding. 

Proof :  If  the  MSA  completes  decoding  in  the  first  stack  without 
forming  any  additional  stacks,  it  executes  the  same  steps  as  the 
SSA,  and  the  decoding  path  (denoted  here  as  j)  will  be  the  same 
as  that  of  the  SSA  with  an  infinitely  large  stack. 

If  the  MSA  forms  higher-order  stacks  to  reach  the  first 
tentative  decision  and  j  is  included  in  the  T  transferred  nodes, 
j  will  be  among  the  tentative  decisions  reached  before  returning 
to  the  first  stack.  The  MSA's  final  decision  will  be  at  least 
as  good  as  j . 

If  the  MSA  forms  higher-order  stacks  to  reach  a  tentative 
decision  but  j  is  not  transferred  to  the  next  stack,  the  MSA 
obtains  some  tentative  decision  before  returning  to  the  first 
stack.  Again  the  MSA's  final  decision  will  be  made  in  the  first 
stack  and  will  be  at  least  as  good  as  j. 
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In  the  case  enumerated  in  the  proof,  the  correct  path  j  is  always 
among  the  decision  elements,  and  the  MSA's  final  decision  must  be  at 
least  as  good  as  that  of  the  SSA.  Therefore,  efforts  are  made  to  finish 
MSA  decoding  in  the  first  stack,  to  achieve  maximum-likelihood  decoding. 
It  has  been  shown  that  the  number  of  computations  which  the  SSA  decoder 
must  perform  to  decode  the  received  digit  is  a  random  variable  of 
Pareto  distribution;  i.e.: 


P  tC  >  CJ  ’  CSA  S'*  (17) 

where  C  is  the  number  of  computations  which  the  SSA  must  perform  to 

decode  a  block  of  k  information  bits  and  ip  is  a  parameter  which 

depends  on  the  channel  properties  and  the  rate  of  the  code.  In  the 

SSA,  we  call  the  probability  of  this  single  stack  overflow  the  erasure 

probability.  For  the  MSA,  this  is  just  the  probability  that  the 

number  of  node  extensions  C  (or  equivalently  the  number  of  computations) 

equals  or  exceeds  the  size  of  the  first  stack  Z^,  or  P  [c  >  (Z^  -  1)]. 

According  to  Equation  (17),  this  probability  equals  CgA  (Z^  -  l)-^  =  P^. 

The  proportionality  constant  CgA  and  the  parameter  ip  are  both 

independent  of  Z. .  (A  detailed  derivation  of  CL.  is  contained  in 
l  SA 

Reference  [l8] ) .  Although  P^  can  be  made  very  small  by  making  Z^ 
very  large,  some  fraction  of  codewords  (even  though  very  small)  will 
always  need  secondary  stacks  to  complete  their  decoding  process.  In 
the  SSA,  stack  overflow  causes  erasures  which  might  be  intolerable  for 
some  situations.  In  the  MSA,  more  stacks  are  available  to  obtain 
some  decision  before  the  computational  effort  is  exceeded,  as  described 
in  Property  2. 
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PAopeAty  2:  In  a  noi&y  environment,  Aeceived  Ae.que.ncu  which  Aequine 
exceAAive  Aearching  will  cauAe  the  SSA  to  quit  when  AtackA  oveA^low. 
With  the  MSA,  a  tentative  decision  iA  alwayA  obtained  begone  the 
computational  limit  iA  leached.  Thus,  while  the  SSA  Au^eAA  $Aom 
eAOAuACA  | which  meanA  it  would  have  to  gueAA  thiough  the  tn.ee  a{tei 
the  decode a  quitA),  the  MSA  continueA  to  decode  and  obtainA  a  non¬ 
maximum  likelihood  decision  which  haA  a  loweA  caaoa  piobability  than 
a  Aandom  gueAA. 

Proof;  Because  the  MSA  has  additional  stacks  available  for 
processing,  a  tentative  decision  will  be  reached  if  the 
computational  limit  is  set  sufficiently  high.  This  temporary 
fast  searching  to  get  at  least  some  results  is  a  compromise 
between  sequential  searching  and  parallel  processing.  If 
the  SSA  produces  an  erasure,  the  user  may  (1)  randomly  select 
a  codeword,  or  (2)  make  a  coin-tossing  guess  to  select  the 
decoded  bits.  Since  the  decoding  metric  increases  monotonically 
on  the  correct  path  and  declines  on  all  incorrect  paths,  and 
since  the  best  metric  examined  is  retained  for  a  tentative 
decision,  strategy  (1)  will,  on  the  average,  offer  a  path 
with  a  poorer  metric  than  the  tentative  decision.  In  strategy 
(2) ,  the  path  metric  for  the  random  guess  is  no  larger  than 
the  smallest  tree-path  metric,  since  the  decoded  bits  selected 
by  the  user  are  statistically  Independent  of  the  source  bits. 
Consequently,  once  a  tentative  decision  is  obtained,  the  estimate 
Is  better  than  a  random  guess. 

PiopeAty  3:  Foa  T  =  f,  the  numbeA  o$  AtackA  (a)  ^ oimed  be^oAe  the 
terminal  node  iA  Aeached  ioA  the  ^iAAt  time  neveA  exceeds  the  numbeA 
0(5  tiee  bAancheA  (fe)  without  the  tail  U.e.,  P(u>fe)  *  0  pAovided 
Clim  **  n0*  exceeded) .  A t  moAt,  k  AtackA  aie  formed  to  leach  the 
faiiAt  tentative  deciAion. 
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Proof :  We  are  trying  to  predict  the  maximum  number  of  stacks 
required  before  a  tentative  decision  is  reached.  For  T  =  1, 
only  one  node  entry  is  transferred  to  a  new  stack  at  the  time 
of  overflow.  The  first  stack  guarantees  that  the  top  of 
stack  1  at  the  time  of  overflow  has  at  least  tree  path  length 
one.  Similarly,  the  path  at  the  top  of  stack  i  at  the  time 
of  its  overflow  has  at  least  length  i.  Since  the  tree  does 
not  branch  in  the  tail  and  the  tree  is  of  path  length  k,  to 
this  point  there  will  be  a  maximum  of  k  stacks  formed  before 
the  first  tentative  decision  is  reached. 

This  does  not  mean  that  we  need  only  k  stacks  to  terminate  decoding 
in  all  cases.  Although  it  requires  a  maximum  of  k  stacks  to  reach 
the  first  tentative  decision,  the  decoder  may  need  more  stacks  to 
finish  searching  and  comparison  to  reach  the  best  decision.  Property  3 
only  is  true  for  T  ■  1;  for  T  >  1,  more  than  k  stacks  may  be  needed 
to  reach  the  first  tentative  decision. 


Pnopenty  4:  Fan.  T  =  1  and  a  code  o h  note  h  having  conAtnaint  length  v, 
the  MSA  it  enaAun.ehK.ee  ih  exceeds  a  critical  lowen  bound  CcA^Jt 
given  by: 

caU’  Z  -  II  *  2lv  -  II  H*l 

i-  1 


whene  1 i=l,  . 


.  it>  the  Aize  oh  Atack  i  and  k  it>  the  numben. 
oh  tn.ee  bnancheA  without  the  tail. 


Proof :  From  Property  3  we  learned  that  a  maximum  number  of  k 
stacks  is  formed  before  the  first  tentative  decision  is  reached. 
Since  Z^-l  computations  are  executed  in  stack  i  before  overflow 
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(where  is  the  number  of  entries  in  the  iC^  stack,  i«l, 
2,  .  .  k-1) ,  a  total  of 


k-1 


E 

i=l 


(19) 


computations  are  performed  before  the  k1"*1  stack  is  formed. 

In  this  last  stack  where  the  tree  branches  only  once  more 
before  the  tail,  a  maximum  of  2(v-l)  node  extensions  are 
executed  before  the  end  of  the  tree  is  reached  (there  are 
(v-1)  nodes  to  be  branched  into  2  possible  tree  paths) . 

Hence,  a  maximum  of 

k-1 

Ccrit  "  £  (Z,  -  1)  +  2 (v-1)  (20) 

i=l  1 

computations  is  necessary  to  reach  the  first  tentative 

decision.  As  noted  in  Property  2,  the  MSA  is  erasurefree 

when  at  least  one  tentative  decision  is  obtained  before  the 

decoding  stops  by  reaching  C,,  .  If  we  set  >  C  ,  , 

lim  lim  crit 

the  MSA  will  obtain  the  first  tentative  decision  before 
decoding  is  terminated. 

This  property  is  especially  important  because  (1)  it  guarantees 
that  the  MSA  achieves  erasurefree  decoding,  and  (2)  it  points  out 
a  practical  relationship  between  the  computational  limit  and  stack 
storage.  Consequently,  it  helps  in  determining  so  the  MSA 

achieves  erasurefree  decoding  at  the  lowest  cost  of  storage  and 
computational  effort. 
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Property  5:  The  probability  P{u>v)  that  the  number  o f  stacks  needed  to 
reach  the  first  tentative  decision  exceeds  some  number  v  decreases 
exponentially  with  v  for  sufficiently  large  value  of  v.  The 
probability  P(C>Cy)  that  the  number  of  computations  C  needed  to 
reach  the  first  tentative  decision  is  greater  than  some  number  C 
decreases  exponentially  with  C  if  Cy  <s  the  number  of  computations 
executed  before  stack  v  overflows'- 

C„  =  Zj-1  +  Jr  (Z.-T)  (21) 

v  1  T=2  *- 

Proof:  First  we  calculate  the  number  of  computations  (node 
extensions)  executed  at  the  moment  of  overflow.  At  the 
first  stack,  exactly  Z^-l  computations  are  performed  before 
it  overflows.  At  the  moment  of  overflow,  the  top  T  entries 
are  transferred  to  the  second  stack  where  decoding  continues. 

Since  these  T  entries  occupy  the  second  stack,  only  Z^-T 
node  extensions  are  permitted  within  the  second  stack  before 
it  overflows.  Similarly,  only  Z^-T  node  extensions  can  be 
performed  at  the  i*"^  stack.  The  total  Cy  executed  before 
stack  v  overflows  is  accumulated  as  equation  (21) . 

The  proof  of  the  exponential  nature  of  P(u>v)  is  given 
by  Chevillat  in  reference  [14].  Consequently,  the  number  of 
computations  before  overflow  of  the  vC^  stack  as  given  in 
equation  (21)  can  be  applied  to  show  that  the  number  of 
computations  is  also  exponentially  distributed,  for  a  first 
tentative  decision,  r*  equivalently  erasurefree  decoding. 

Most  of  the  properties  of  the  MSA  have  been  discussed  above  for 
T-l,  but  they  can  be  generalized  for  T>1.  Although  proofs  have  not 
been  developed  for  T>1,  we  have  noticed  from  simulation  that  the 
selection  of  1<T<4  has  very  little  effect  on  the  error  performance. 
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SECTION  VI 


SIMULATED  PERFORMANCE  OF  THE  MSA 


In  comparison  with  the  Viterbi  algorithm,  the  MSA  also  achieves 
erasurefree  convolutional  decoding,  but  with  a  modest  decoding  effort, 
at  the  expense  of  increased  memory  requirements.  But  the  steadily 
decreasing  cost  and  increasing  capacity  of  microcomputer  memories 
makes  this  tradeoff  appear  worthwhile.  In  contrast,  the  Viterbi 
maximum  likelihood  algorithm  has  a  computational  effort  fixed  by 
the  constraint  length.  This  computational  effort  grows  exponentially 
as  the  constraint  length  is  increased.  Increased  memory  cannot  be 
employed  as  efficiently  by  the  Viterbi  algorithm  as  by  the  MSA.  In 
this  section,  we  present  simulation  results  showing  the  performance 
of  the  MSA  relative  to  the  Viterbi  algorithm.  Basically,  we  have 
concluded  that  similar  error  rates  can  be  maintained  at  faster  through¬ 
put  with  similar  computational  complexity  by  using  the  MSA. 

6.1  Parameter  Values 


The  residual  error  performance  was  measured  during  all  trials  as 
a  function  of  a  binary  symmetric  channel  input  signal-to-noise  ratio 
(SNR)  in  the  range  of  4.5  to  7  dB.  Equivalently,  the  conditions 
shown  in  Table  V  apply  for  a  gaussian  channel  crossover  probability 
in  the  range  of  0.013  <i  p  <  0.045. 

The  parameters  (T,  Z, ,  Z.,  C,.  )  of  the  MSA  were  determined  in 

1  I  lim 

accordance  with  the  discussion  of  Section  V.  The  values  of  Z^  and 
were  selected  primarily  to  achieve  optimum  performance  with 
available  memory  In  the  Z-80.  In  addition,  Z^  was  chosen  to  fit  the 
Z-80  memory  and  was  made  compatible  with  (or  larger  than) 
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TABLE  V 


General  Parameter  Values  for  (2,1)8  MSA  and  (2,1)15  MSA  Simulations 


— 

^^^lgorithms 

Parameters 

(2,1)8  MSA 

(2,1)15  MSA 

free  distance,  d,. 

free 

10 

17 

CODE 

generator  poly¬ 
nomial  ,  G-^ 

(1344) g 

(141512) g 

generator  poly¬ 
nomial,  g2 

(1434)g 

(136266)g 

total  number  of 
information  bits 
processed 

>  106 

>  106 

DATA 

number  of  infor¬ 
mation  bits  per 
frame,  k 

64 

64 

number  of  code  bits 
per  frame,  n^ 

142 

156 

first  stack  size, 

1024 

2900 

computational  limit, 

1700 

3600 

MSA 

lim 

secondary  stack  size, 
Zi 

11 

11 

number  of  nodes, 
transferred,  T 

3 

3 
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to  obtain  erasurefree  decoding.  The  discussion  below  concerns  four 
parameters  which  determine  the  overall  performance  of  the  MSA: 

(1)  residual  probability  of  error,  (2)  required  stack  size  or  storage 
and  (3)  average  number  of  decoding  steps  or  computations,  or 
equivalently,  (4)  throughput.  We  shall  see  later  that  (3)  and  (4) 
are  interchangeable  as  measures  of  decoder  speed. 

6.2  Error  Performance  of  the  MSA 

Sequential  decoding  has  the  characteristic  that  when  used  with 
appropriate  tree  codes  to  signal  over  memoryless  channels  the 
probability  of  error  decreases  exponentially  toward  zero  at  all 
transmission  rates  less  than  channel  capacity.  For  the  MSA,  which 
is  a  new  type  of  sequential  decoding,  we  expect  this  behavior. 

Before  obtaining  experimental  results  that  typically  represent 
the  MSA  performance,  various  simulations  were  performed  to  select 
adequate  decoding  parameters,  especially  those  which  influence  the 
error  rate.  We  found  that  although  decreases  with  increasing  Z^, 
and  also  decreases  with  increasing  C.^m,  Z^  and  play  completely 

different  roles  in  the  MSA  decoding.  Z^,  the  first  stack  size,  should 
be  made  large  enough  to  approximate  SSA  decoding  for  most  blocks  so 
that  few  blocks  would  necessitate  secondary  stack  operations.  For 
those  very  noisy  blocks,  is  made  at  least  equal  to  Ccrit  above 

which  erasurefree  decoding  is  achieved.  Therefore,  allows  at 

least  one  tentative  decision  during  secondary  stack  processing  before 
the  machine  halts  because  of  saturation.  We  found  that  this  aspect  of 
the  MSA  improves  the  error  performance (by  about  20%,  on  the  average 
during  all  trials  for  practical  error  rates),  in  comparison  with  a 
random-guess  upon  SSA  saturation. 

We  found  that  it  was  not  feasible  to  improve  performance  by 
linearly  increasing  Cllm  any  further.  We  believe  that  for  larger 
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computational  limits,  the  MSA  will  try  to  return  to  previous  stacks 

to  improve  the  first  tentative  decision.  Under  very  noisy  conditions 

which  require  secondary  stacks,  the  correct  path  has  such  a  low  metric 

that  it  is  buried  deeply  in  one  of  the  stacks.  The  return-to-stack 

operation  would  only  tend  to  form  more  stacks  without  making  any 

real  contribution  toward  locating  the  correct  path  in  the  tree. 

Finally,  the  MSA  reaches  the  larger  and  usually  outputs  the 

first  tentative  decision.  We  tested  this  hypothesis  and  found  that 

for  the  (2,1)8  MSA,  a  value  of  1700  for  C. .  produced  the  same  results 

lim 

as  did  a  value  of  5120,  and  that  for  the  (2,1)15  MSA,  a  value  of 

3600  for  gave  the  same  results  as  a  value  of  8192.  Consequently, 

we  selected  the  lower  C,.  value  to  achieve  erasurefree  decoding. 

lim 

The  results  of  the  simulation  for  the  stated  parameter  values 
are  given  in  Tables  VI  through  X.  Curves  of  decoded  error  rate  are 
plotted  in  Figure  10  in  comparison  with  the  performance  of  the  (2,1)8 
Viterbi  algorithm  decoder  under  the  same  channel  noise  environment . 

We  concluded  that  the  (2,1)8  Viterbi  algorithm  performs  worse  than 
the  (2,1)15  MSA,  but  it  clearly  outperforms  the  (2,1)8  MSA  in  terms 
of  error  rate.  The  increase  of  constraint  length  from  8  to  15  adds 
modest  complexity  to  the  MSA.  Both  the  (2,1)8  and  the  (2,1)15  MSA 
provide  erasureless  decoding  as  does  the  Viterbi  algorithm.  The 
MSA  has  additional  flexibilities  available.  The  user  can  adjust  the 
parameters  to  a  certain  extent  to  fit  the  computational  assets  while 
achieving  acceptable  decoder  error  performance,  and  the  error  per¬ 
formance  becomes  asymptotically  optimum  if  enough  decoding  memory  is 
provided.  Further  consideration  will  be  given  to  storage  and  through¬ 
put  in  the  following  paragraphs. 

6. 3  Storage  Needed  for  the  MSA 

The  storage  requirements  of  the  MSA  relate  directly  to  the 
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(SIMULATED)  SIGNAL  TONOISE  RATIO. 


Figure  10  ERROR  PERFORMANCE  COMPARISONS  OF  (2.1)  S  MSA,  (2.1)  8  VITERBI  AND 
(2.11  IS  MSA  ALGORITHMS 
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TABLE  VI 

Error  Rate  and  Throughput  Performance  of  the  (2,1)8  MSA 


^V^erformance 

^^arameters 

Decoded  Error  Rate 

Throughput  (BPS) 

4.609 

_3 

2.184  x  10 

160 

4.812 

1.155  x  10-3 

266 

5.032 

6.45  x  10~4 

379 

5.272 

3.79  x  10'4 

505 

5.529 

1.92  x  10~4 

-5 

630 

5.823 

9  x  10  3 

750 

6.129 

3  x  10~5 

950 

6.298 

1.3  x  10-5 

1050 

6.465 

6  x  10~6 

1126 

6.669 

2.8  x  10-6 

1150 
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TABLE  VII 


Error  Rate  and  Throughput  Performance  of  the  (2,1)15  MSA 


\P  e  r  f  o  rmanc  e 
^■\Parameters 

(Eb/No>d/^\ 

— 

Decoded  Error  Rate 

Throughput  (BPS) 

4.609 

8.16  x  10-4 

63 

4.812 

4.48  x  10-4 

108 

5.032 

3.34  x  10-4 

158 

5.272 

2.12  x  10-4 

233 

5.529 

9.1  x  10~5 

330 

5.823 

2.2  x  10-5 

530 

6.129 

3.5  x  10-6 

770 

6.298 

1.2  x  10~6 
-7 

900 

6.465 

4  x  10 

-7 

980 

6.669 

1.8  x  10 

1000 
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TABLE  VIII 


Error  Rate  and  Throughput  Performance  of  the 
(2,1)8  Viterbi  Algorithm 


^^Performance 

^V^arameters 

Decoded  Error  Rate 

Throughput  (BPS) 

4.609 

1.4  x  10“3 

50 

4.812 

7.72  x  10~4 

50 

5.032 

5  x  10"4 

50 

5.272 

2.3  x  10-4 

50 

5.529 

1.3  x  10-4 

50 

5.823 

5  x  10-5 

50 

6.129 

1.4  x  10“5 

50 

6.298 

6  x  10~6 

50 

6.465 

2.3  x  10~6 

50 

6.669 

1.1  x  10-6 

50 
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TABLE  IX 


Performance  Comparison  Under  Quiet  (p  =  0.029)  Situation 


^  v  Algorithms 

Performance^'v^ 

Parameters 

(2,1)8  MSA 

(2,1)8  VA 

(2,1)15  MSA 

Decoded 

Error  Rate 

1.92  x  lO-4 

1.3  x  10“4 

9.1  x  10-5 

Throughput 

630  BPS 

50  BPS 

330  BPS 

Storage 

18  kBytes 

2  kBytes 

44  kBytes 

TABLE  X 

Performance  Comparison  Under  Noisy  (p  =  0.045)  Situation 


Algorithms 

Performance 

Parameters 

(2,1)8  MSA 

(2,1)8  VA 

(2,1)15  MSA 

Decoded 

Error  Rate 

2.18  x  10-3 

1.4  x  10"3 

8.16  x  10-4 

Throughput 

160  BPS 

50  BPS 

63  BPS 

Storage 

18  kBytes 

2  kBytes 

45  kBytes 
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implementation  cost  of  the  decoder.  These  requirements  increase 
linearly  with  the  constraint  length  while  those  of  the  Viterbi 
algorithm  increase  exponentially  as  shown  in  Figure  11.  As  we 
mentioned  before,  MSA  memory  is  traded  off  for  tolerable  error 
performance  and  erasurefree  decoding.  We  found,  for  example,  that 
the  available  Z-80  memory  (44k  bytes)  obtained  nearly  optimum 
performance  for  the  (2,1)8  code.  Additional  memory  could  be  used  to 
slight  advantage.  For  the  (2,1)15  code,  however,  the  Z-80  memory 
proved  inadequate;  the  longer  code  required  additional  memory  which 
was  conveniently  obtained  by  borrowing  from  a  collocated  NOVA 
minicomputer . 

There  are  15  bytes  for  each  stack  entry.  For  the  (2,1)8  MSA 
with  *  1700,  approximately  25k  bytes  of  memory  are  required  for 

processing.  For  the  (2,1)15  MSA,  the  memory  requirement  for  = 

3600  is  54k  bytes  which  is  quite  achievable  on  a  special  purpose 
micro-computer  system  (although  not  available  on  our  Z-80  which  had 
44k  bytes  available  for  processing  after  accommodating  program  storage). 
The  goal  of  using  low-cost  microprocessors  to  implement  high-performance 
erasureless  sequential  decoding  seems  quite  attainable. 

It  is  also  helpful  to  talk  about  storage  in  terms  of  stacks.  The 
number  of  stacks  required  for  each  frame  (or  block)  is  dependent  upon 
the  size  of  the  first  stack,  the  size  of  the  higher-order  stacks,  and 
the  computational  limit.  The  larger  the  first  stack  is  made,  the  less 
probable  it  is  that  the  decoder  will  create  secondary  stacks.  Thus, 
it  is  more  likely  that  the  MSA  will  behave  like  the  SSA.  If  the 
higher-order  stacks  are  made  larger  while  fixing  other  parameters,  the 
required  number  of  stacks  will  be  relatively  smaller.  However,  more 
computations  will  be  needed  due  to  the  increasing  size  of  the  secondary 
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stacks.  The  increase  of  computational  limit  will  cause  more  stacks 
to  be  formed  before  the  decoder  terminates,  which  means  large  storage 
is  required.  For  our  goal  of  low  cost  and  modest  memory  requirements, 
the  first  stack  size  is  large  while  the  higher-order  stack  size  and 
computational  limit  are  small  (yet  computational  limit  must  exceed 
Ccrit  t0  achieve  erasurefree  decoding).  For  our  selected  parameters, 
the  probability  P(u>v)  (that  the  number  of  stacks  u  needed  to  reach 
the  first  tentative  decision  exceeds  some  number  v)  is  an  exponentially 
decreasing  function  of  v,  as  can  be  seen  in  Figure  12  and  13.  Most 
of  the  blocks  are  processed  with  a  relatively  small  amount  of  storage. 
Also  note  that  the  probability  of  SSA  operation  is  much  higher  (99,95% 
of  the  time  for  (2,1)8  and  99.99%  of  the  time  for  (2,1)15)  than  is 
the  probability  of  forming  higher-order  stacks  because  Z^  is  large. 

The  MSA  will  form  a  number  of  higher-order  stacks  when  p  is  large 
(p  •*  0.045,  for  example)  since  Z^  is  too  small  to  approximate  the 
SSA  decoder.  Once  a  second  stack  is  formed,  it  is  likely  to  also  form 
other  higher-order  stacks.  The  reason  is  simply  that  the  MSA  is 
processing  some  badly  corrupted  sequences  and  the  size  of  higher-order 
stacks  is  too  small  (Z^  =  11)  to  carry  out  the  decoding  without 
creating  additional  stacks.  Z^  cannot  be  made  large  because  in  that 
case  although  less  stacks  will  be  required,  computation  effort  will 
be  too  large  to  handle.  It  is  fair  to  say  that  we  would  rather  stay 
at  the  first  stack  all  the  time  to  finish  decoding,  but:  if  the  noise 
is  severe  enough  to  use  secondary  stacks,  a  large  number  of  higher- 
order  stacks  should  be  available  to  process  these  worst-case  sequences. 
Therefore,  we  have  made  available  a  number  of  secondary  stacks  in 
the  range  0  £  u  £  256. 

We  conclude  this  discussion  on  storage  requirements  with  a 
comparison  between  the  storage  required  for  the  MSA  and  that  of  the 
Viterbi  algorithm.  The  storage  required  for  the  MSA  at  p  ■  0.029  is 
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about  20k  bytes  for  the  (2,1)8  code  (there  are  1354  entries  each  with 
a  15  byte  array)  and  about  54k  for  the  (2,1)15  code  (there  are  3153 
entries  each  with  a  15  byte  array).  For  the  (2,1)8  Viterbi  algorithm, 
only  2k  bytes  are  needed  for  storage  in  the  Z-80  but  the  requirement 
would  increase  to  368k  bytes  if  the  (2,1)15  decoder  were  to  be  imple¬ 
mented.  That  is  the  main  reason  why  the  Viterbi  algorithm  is  con¬ 
strained  to  relatively  short  codes  while  the  MSA  can  accommodate 
longer  codes. 


6.4  Computations  and  Throughput  of  the  MSA 

There  are  two  measures  of  decoding  speed  which  are  closely 
related  to  each  other;  one  is  the  number  of  computations  or  node 
extensions;  the  other  is  the  actual  throughput.  The  latter  has 
more  significance  when  we  are  comparing  different  decoding  algorithms. 

The  time  consumed  by  the  decoder  is  approximately  proportional 

to  the  number  of  decoder  computations  that  are  performed,  where  a 

single  decoder  computation  of  the  MSA  comprises  all  the  operations 

performed  for  each  node  extension  (including  metric  calculations, 

ordering  and  possibly  stack  transfer /deletion) .  For  the  stack 

algorithm  the  average  number  of  decoder  computations  per  decoder 

infoi  ation  bit  is  bounded  by  a  constant  for  all  rates  less  than 

R  .  The  R  is  a  function  of  the  channel  transition  probabilities 
comp  comp 

only,  and  exceeds  one-half  the  channel  capacity  for  all  nonpathological 

memoryless  channels.  The  average  number  of  computations  increases 

exponentially  with  k  if  R>R  .  Therefore,  for  large  k,  one  would 

ordinarily  not  attempt  to  use  a  stack  algorithm  at  rates  that 

exceed  R  ;  i.e.,  for  BSC  crossover  probability  p  >  0.045.  It 
comp 

was  stated  previously  in  Section  II  that  even  the  SSA  shows  a 
significant  speed  advantage  over  the  Fano  algorithm  for  the  BSC. 

But  the  computational  distribution  is  Pareto  for  the  SSA. 
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Below  R  ,  the  search  increases  only  linearly  and  it  is 
ccmp 

intuitively  satisfying  to  see  that  as  long  as  the  decoding  work  has 

the  same  character  as  that  undertaken  by  the  maximum  likelihood 

decoder,  the  error  results  are  the  same.  At  rates  below  R  ,  the 

comp 

error  performance  is  close  to  optimal,  whereas  the  decoding  effort 

increase  is  only  linear  with  k.  As  the  channel  rate  R  approaches  close 

to  Rcomp  from  below,  the  computational  load  increases  tremendously. 

For  R  <  R  ,  P(C>C  )  =  AC  ^  where  A  is  a  constant  of  proportionality; 
comp  v  v 

but  the  average  decoding  computational  effort  becomes  infinite  when 

R  =  ^comp*  While  this  does  not  necessarily  happen  in  practice  (the 

theoretical  result  is  only  a  bound),  the  dramatic  increase,  even  as 

R  exceeds  about  0.9  R  ,  has  certainly  been  observed.  For  the  MSA, 

comp 

p  (C  >  Cv)  is  an  exponentially  distributed  function  of  (Property  5 
of  Section  V)  as  shown  in  Figures  14-16.  The  number  of  computations 
on  the  average  is  very  small  and  for  considerably  less  than  Ccrit> 
p  (C  >  can  be  made  small  also.  This  also  means  the  computations 

needed  to  reach  the  first  tentative  decision  are  almost  always  less 
than  and  erasurefree  decoding  is  achieved. 


The  number  of  computations  for  the  MSA  is  a  random  variable;  we 
found  the  average  is  1.37  computations  per  information  bit  for  the 
(2,1)8  code  and  1.41  for  the  (2,1)15  code.  The  number  of  computa¬ 
tions  of  the  Viterbi  algorithm  is  a  constant  decoding  effort  of  128 
computations  per  information  bit  for  the  (2,1)8  code.  Although  the 
average  number  of  computations  of  the  MSA  is  much  less  than  that  of 
the  Viterbi  algorithm  (by  two  orders  of  magnitude) ,  the  operations 
needed  to  be  performed  for  each  computation  (stack  order/deletion/ 
creation)  are  more  complicated  and  consume  more  computer  time  than 
those  of  the  simple  add/compare/select  operation  of  the  Viterbi 
algorithm.  With  the  software  implementation  of  the  MSA,  erasurefree 
decoding  is  also  obtained  at  the  expense  of  some  computer  time  (com¬ 
putational  limit  needs  to  be  set  sufficiently  large).  The  stack 
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F.*ir«  IB  THE  EROS  ABILITY  DISTRIBUTION  THAT  THE  NUMBER  OF  COMPUTATIONS  PERFORMED  FOR 
ERASUREFREE  DECODING  C  IS  LARGER  THAN  SOME  NUMBER  C,  (1900  ^  C,  <  30001  FOR  (2.1)  15  MSA 
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dreation/order ing/deletion  operation  is  about  10  times  slower  than 
the  add/compare/select  operation. 


We  see  a  close  relationship  between  computational  limit  and 

the  number  of  stacks  u.  is  actually  the  number  of  entries 

which  must  be  put  in  u  stacks.  Since  the  first  stack  size  is  Z^  and 
each  secondary  stack  size  is  Z^,  the  number  of  stacks  required  is 
therefore: 


where 


u  * 


+  1 


(22) 


accounts  for  the  number  of  secondary  stacks 


and  the  added  1  accounts  for  the  first  stack. 

If  we  fix  the  computational  limit,  Clim  (Clim  =  1700),  the 
first  stack  size,  Z^  (Z^  =  1024)  and  the  secondary  stack  size,  Z^ 

(Z±  =  11) ,  the  maximum  number  of  stacks  required  is  determined  to  be 
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The  second  measure  which  gives  the  actual  decoding  speed  is  the 
throughput  of  the  decoder.  It  counts,  in  terms  of  bits/second,  the 
speed  of  the  MSA  operation  while  excluding  the  processing  time 
of  the  other  peripheral  devices  (encoder,  noise  generators,  etc.). 
The  throughput  is  a  more  realistic  measure  than  the  number  of 
computations  and  reflects  the  actual  speed  advantage  of  the  MSA 
over  the  Viterbi  algorithm.  The  throughput  of  the  (2,  1)8 
MSA  under  moderate  noise  environment  (p  =  0.029)  is  500  bits/second, 
as  compared  to  50  bits/second  for  the  (2,1)8  Viterbi  algorithm. 
Therefore,  the  MSA  is  actually  about  ten  times  faster  than  the 
Viterbi  algorithm  to  achieve  comparable  error  performance.  A  compari¬ 
son  of  throughput  is  shown  in  Figure  17  and  Table  IX  and  X.  Notice 
that  MSA  shows  a  varying  throughput  while  the  Viterbi  algorithm  gives 
constant  throughput  for  various  noisy  conditions.  These  results 
further  demonstrate  the  variable  nature  of  the  computational  effort  of 
the  MSA. 
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Figure  1 7  THROUGHPUT  COMPARISONS  OF  (2,1)8  MSA, 
12.1)  15  MSA  AND  (2.1  IS  VITERBI  ALGORITHMS 
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SECTION  VII 

CONCLUSIONS  AND  SUGGESTIONS  FOR  FURTHER  WORK 


In  this  report,  we  have  examined  a  software  implementation  of 
the  Multiple  Stack  Sequential  Decoding  Algorithm  (MSA)  on  the  Z-80 
microcomputer  system  and  have  discussed  the  resulting  performance 
subject  to  certain  rules  of  parameter  selection.  Techniques  have 
been  described  for  maintaining  a  tolerable  undetected  error  probability 
as  the  SNR  decreases.  The  utilization  of  the  MSA  has  eliminated  the 
erasure  problem  caused  by  incomplete  decoding  in  other  sequential  de¬ 
coding  procedures.  The  major  limiting  property  of  the  Single  Stack 
Algorithm  (SSA) ,  Pareto  computational  distribution,  has  been  removed  by 
processing  multiple  stacks  arranged  in  parallel;  the  computational  dis¬ 
tribution  is  now  an  exponentially  decreasing  function  of  some  number  C^. 
The  feasibility  of  implementing  both  the  Viterbi  algorithm  and  the  MSA 
on  the  Z-80  system  has  also  been  verified.  In  this  section,  we  will 
draw  some  conclusions  and  make  suggestions  based  on  the  following: 

1.  performance  comparison  of  MSA  and  Viterbi  algorithm  with 
respect  to: 

a.  decoded  error  rate, 

b.  decoding  effort, 

c.  storage  requirement, 

2.  soft  quantization  possibilities  for  MSA, 

3.  real-time  (2,1)8  MSA  decoder  possibilities  and  feasibility 
requirements , 

A.  performance  of  the  MSA  on  a  burst  noise  model, 

5.  suggested  further  research  areas. 
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7.1  Performance  Comparisons  of  the  MSA  and  Viterbl  Algorithm 


The  performances  of  the  MSA  and  the  Viterbi  algorithm  have  been 
compared  in  respect  to  the  decoded  error  rate,  the  decoding  effort 
(or  throughput  rate)  and  the  storage  requirements.  We  have  concluded 
that  the  decoded  error  rate  of  the  (2,1)8  MSA  is  slightly  worse  (less  than 
0 , 5dB)  than  that  of  the  (2,1)8  Viterbi  algorithm,  but  the  throughput  rate 
is  ten  times  faster  when  implemented  on  the  Z^80.  If  the  user  can  tolerate 
some  degradation  of  error  performance  to  gain  more  speed,  the  (2,1)8 
MSA,  with  a  storage  requirement  ten  times  larger  than  that  of  Viterbi 
algorithm,  is  certainly  an  effective  alternative.  The  (2,1)15  MSA 
performs  better  than  the  (2,1)8  Viterbi  algorithm  with  respect  to 
decoded  errors,  while  the  average  throughput  rate  is  still  somewhat 
faster.  However,  the  storage  requirement  is  about  twenty  times 
larger.  But,  considering  the  declining  cost  of  microcomputer  memory, 
the  (2,1)15  MSA  may  constitute  the  most  effective  alternative  to  a 
(2,1)15  Viterbi  algorithm  if  a  complete  decoding  method,  which 
achieves  low  error  probabilities  at  acceptable  speeds,  is  desired. 

7.2  Soft  Quantization  Possibilities  for  MSA 

For  the  Viterbi  algorithm,  soft  quantization  is  easy  to  implement 

and  is  a  good  practice.  The  2  dB  gain  is  usually  obtained  by  processing 

the  sof t  quantized  data  at  the  input  section,  and  then  is  incorporated 

in  the  branch  and  path  metrics  without  requiring  additional  storage. 

The  MSA,  on  the  other  hand,  must  store  several  thousand  branches  of 

data  according  to  the  size  of  the  stack;  each  would  contain  log2 

-jp—  bits  for  rate  R  and  Q-level  quantization.  The  complicated 
c 

operations  of  the  decoder,  such  as  stack  creation,  stack  transfer, 
and  stack  deletion,  make  the  task  of  soft  quantization  even  more 
unpredictable.  The  2  dB  improvement  promised  by  theory  might  not  be 
worth  the  added  complexity. 
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7 . 3  Real-Time  (2,1)8  MSA  Feasibility  Requirements 

The  MSA  was  tested  using  mainly  software  implementations  with  a 
simulated  channel  model.  If  a  special-purpose  hardware  decoder  and 
peripheral  devices  were  used,  the  data  rate  could  have  been  high 
enough  for  real-time  consideration  and  on-line  processing  low  data- 
rate  applications  where  special-purpose  Viterbi  decoders  are  currently 
being  used.  As  for  the  software  Z-80  MSA  decoder,  only  quite  low  data 
rate  was  possible  because  of  the  speed  limit  of  the  Z-80  (in  tie  range 
of  a  few  hundred  bits  per  second) .  The  software  Viterbi  algorithm 
achieves  data  rates  of  only  a  few  tens  of  bits  per  second  on  the 
same  machine.  The  combination  of  the  MSA  and  next-generation 
processors  may  prove  to  be  more  rewarding. 

Judging  from  the  decoding  operations  of  the  MSA,  it  would  be 
more  timesaving  for  the  microcomputer  system  to  have  the  following 
instructions. 

(1)  Compare 

(2)  Decrement  and  jump  (an  instruction  that  decrements  an  index 
register  and  jumps  if  the  register  content  is  non-zero). 

The  presence  of  more  than  one  accumulator  would  reduce  the  decoding 
time  even  more.  The  choice  of  the  Z-80  type  of  system  is  justified 
in  this  aspect.  A  few  other  features  that  would  be  helpful  to 
achieve  real-time  decoding  are: 

(1)  all  computer  time  devoted  to  MSA  decoding, 

(2)  maximum  computations,  varied  according  to  allowable 

processing  time. 
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(3)  adjustable  delay  for  Information  delivery  (i.e.,  variable 
buffer) , 

(4)  faster  instruction  cycle  time  with  a  large  speed  advantage 
to  keep  up  with  incoming  data, 

(5)  different  machines  operating  under  a  central  control, 
each  performing  a  single  operation  such  as  stack  ordering, 
stack  deletion,  or  stack  transfer,  or 

(6)  different  machines  operating  independently,  each  performing 
several  operations  and  each  exploring  different  paths  in  the 
code  tree. 

A  16-bit  microcomputer  is  suitable  for  feature  4.  The  bit-slice 
bipolar  microcomputer  is  appropriate  for  features  5  or  6. 

Because  it  is  the  message  that  cannot  be  decoded  that  limits 
the  real-time  decoder  performance,  it  is  also  this  message  which 
must  be  considered  when  speed  and  memory  requirements  are  determined 
for  the  MSA.  All  codewords  could  be  successfully  and  completely 
decoded  given  enough  memory  and  processing  time  for  rates  less  than 
channel  capacity.  Even  with  a  somewhat  constrained  computational 
limit,  the  MSA  is  able  to  obtain  some  decision  which  is  better  than 
the  random-guess  version  of  the  finite-stack  SSA. 

7 . 4  Performance  of  the  MSA  on  Burst  Noise  Model 

It  was  expected  that  sequential  decoding  would  be  unsuitable 
as  a  burst-correcting  technique.  The  variability  of  the  decoding 
computation  time  prohibits  its  successful  utilization  as  a  burst 
decoder.  A  burst  model  was  implemented  to  further  verify  that  this 
consideration  applies  also  to  the  MSA.  The  error  rate  and  throughput 
performance  showed  that  the  MSA  cannot  handle  bursts  at  all  and 
interleaving  and  de-interleaving  techniques  as  are  required  with  the 
Viterbi  algorithm  must  be  applied  to  obtain  tolerable  performance. 


4 


7. 5  Suggestions  for  Further  Study 


Further  study  in  the  use  of  the  multiple  stack  algorithm  as  a 
low-cost,  efficient  sequential  decoding  method  should  include  the 
design  of  a  real-time  MSA  decoder  at  useful  data  rates.  It  should 
also  consider  the  implementation  of  the  MSA  on  either  a  faster  16-bit 
NMOS  microprocessor  (the  Z-8000,  for  instance)  or  on  a  bit-slice 
bipolar  microprocessor  suitable  for  parallel  processing. 
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